Athena Architecture

The Athena Partition Refresh function exists to periodically refresh Athena tables, enabling the searchability of alerts and log data.

The default refresh interval is 10 minutes but can be configured by the user.


The Athena Partition Refresh function utilizes:


StreamAlert Athena Refresh Partition Diagram


Each time the Athena Partition Refresh Lambda function is invoked, it does the following:

  • Polls the SQS queue for the latest S3 event notifications (up to 100)
  • S3 event notifications contain context around any new object written to a data bucket (as configured below)
  • A set of unique S3 Bucket IDs is deduplicated from the notifications
  • Queries Athena to verify the streamalert database exists
  • Refreshes the Athena tables for data in the relevant S3 buckets, as specified below in the list of buckets
  • Deletes messages off the queue once partitions are created

Configure Lambda Settings

Open conf/lambda.json, and fill in the following options:

Key Required Default Description
enabled Yes true Enables/Disables the Athena Partition Refresh Lambda function
enable_custom_metrics No false Enables/Disables logging of metrics for the Athena Partition Refresh Lambda function
log_level No info The log level for the Lambda function, can be either info or debug. Debug will help with diagnosing errors with polling SQS or sending Athena queries.
memory No 128 The amount of memory (in MB) allocated to the Lambda function
timeout No 60 The maximum duration of the Lambda function (in seconds)
schedule_expression No rate(10 minutes) The rate of which the Athena Partition Refresh Lambda function is invoked in the form of a CloudWatch schedule expression.
buckets Yes {} Key value pairs of S3 buckets and associated Athena table names. By default, the alerts bucket will exist in each deployment.


  "athena_partition_refresh_config": {
    "log_level": "info",
    "memory": 128,
    "buckets": {
      "<prefix>.streamalerts": "alerts"
    "...": "...",
    "timeout": 60


If any of the settings above are changed from the initialized defaults, the Lambda function will need to be deployed in order for them to take effect:

$ python deploy --function athena

Going forward, if the deploy flag --function all is used, it will redeploy this function along with the rule function and alert function.


To ensure the function is operating as expected, monitor the following SQS metrics for <prefix>_streamalert_athena_s3_notifications:

  • NumberOfMessagesReceived
  • NumberOfMessagesSent
  • NumberOfMessagesDeleted

All three of these metrics should have very close values.

If the NumberOfMessagesSent is much higher than the other two metrics, the schedule_expression should be increased in the configuration.

For high throughput production environments, an interval of 1 to 2 minutes is recommended.