Introduction
Amazon Athena is a interactive query service for S3. It provides SQL interface for your S3 bucket data. There is no separate data lake needed for this and actual data resides in S3 and Athena provide Query platform for the same. Under the hood it uses Presto.
What is Athena
Serverless platform for querying S3 data
Under the hood it uses Presto
Data status in S3
Supports many formats like CSV, JSON, ORC, Parquet, Avro
Supports structured, semi-structured and structured data
AWS Glue + Amazon Athena
AWS Glue is a fully managed (ETL) service that makes it easy for customers to prepare and load their data for analytics. Amazon Athena metadata (Tables and schema) stored in AWS Glue.
Athena Cost
Athena cost model is very simple because it is serverless and Pay-as-you-go model based on your activity performed on S3 data. Here is the summary
$5 per TB scanned
Glue and S3 have their own charges
Use columnar formats ORC, Parquet for best cost saving / performance
No charges for Failed queries
Conclusion
I Hope you got information about Athena and how we can query data from S3 data that is stored in different (CSV, Parquet, ORC) formats. Please visit Amazon Athena office site https://aws.amazon.com/athena/ for more information.