Python User Defined Functions to Automate Athena Queries using Boto3

munugoti
2 min readAug 30, 2021

Towards the end of 2016, Amazon launched Athena . Amazon Athena is an interactive query service that makes it easy to analyze data in Amazon S3 using standard SQL. Athena is serverless, so there is no infrastructure to manage, and you pay only for the queries that you run.

Athena is easy to use. Simply point to your data in Amazon S3, define the schema, and start querying using standard SQL. Most results are delivered within seconds. With Athena, there’s no need for complex ETL jobs to prepare your data for analysis. This makes it easy for anyone with SQL skills to quickly analyze large-scale datasets.

If you wish to run queries to Athena programatically from e.g. python, you have a few options, for example

  1. PyAthenaJDBC
  2. JayDeBeApi
  3. boto3

I would be using boto3 in this article. The AWS SDK for Python (Boto3) provides a Python API for AWS infrastructure services.The SDK provides an object-oriented API as well as low-level access to AWS services.

If you can use boto3 , it gives you ability to supplement the AWS API calls with additional logic, such as filtering results with. It is also easier to chain API calls, such as making one call for a list of resources, then making follow-up calls to describe each resources in detail.

Execute Athena Queries using Python Boto3

In this article, I would be exaplaning about 2 user defined functions to work with athena.

Code can be found in git repo : https://github.com/munugotis/python_functions_for_athena.git

  1. execute_query_ddl: This function can be called if you want to execute a ddl query in Athena like creating a table or dropping a table. This function will wait till query is completed and will return if the query is succesfull or failed. This will help in case if you need to perform any other action which depends of the status of this query. This function will not return any data set. It just returns the status after query completion.

2. sql_select_athena: This function can be called if you want to execute a query in Athena and pass the result set to a data frame. This function will wait till query is completed and will return the data set in form a dataframe if the query is succesfull. This will help in case if you need to perform any other action which depends of the status of this query and if you need to pass the result set futher in the pipeline.

These user functions can be deployed in a Lambda, Glue ETL , EC2 etc.,

If you enjoyed these instructions, would you mind commenting below 👇🏻?
Also, if there’s something this did not help you with regarding querying Athena with boto3, I’d be happy to improve the article.

--

--

munugoti

Data Engineer | Microsoft Certified DevOps Engineer Expert | AWS Certified Associate Solutions Architect | Snowflake Certified