Query S3 data using Amazon Athena
Amazon Athena is a interactive query service for S3. It provides SQL interface for your S3 bucket data. There is no separate data lake needed for this and actual data resides in S3 and Athena provide Query platform for the same. Under the hood it uses Presto.
What is Athena
- Serverless platform for querying S3 data
- Under the hood it uses Presto
- Data status in S3
- Supports many formats like CSV, JSON, ORC, Parquet, Avro
- Supports structured, semi-structured and structured data
AWS Glue + Amazon Athena
AWS Glue is a fully managed (ETL) service that makes it easy for customers to prepare and load their data for analytics. Amazon Athena metadata (Tables and schema) stored in AWS Glue.
Athena cost model is very simple because it is serverless and Pay-as-you-go model based on your activity performed on S3 data. Here is the summary
- $5 per TB scanned
- Glue and S3 have their own charges
- Use columnar formats ORC, Parquet for best cost saving / performance
- No charges for Failed queries
I Hope you got information about Athena and how we can query data from S3 data that is stored in different (CSV, Parquet, ORC) formats. Please visit Amazon Athena office site https://aws.amazon.com/athena/ for more information.