The Context
My client's platform is a visual tool that empowers anyone to create enterprise-grade web applications with the click of a few buttons, eliminating the need for technical expertise. One of my coolest projects was the opportunity to design, architect, and build the infrastructure to support everything that happens behind the scenes after the user clicks the deploy application button. But that topic is for another article! Instead, I plan to take a bite-sized chunk out of this project to highlight a neat feature that I recently developed and deployed to AWS.
The Challenge
My client hosts hundreds of applications in the AWS cloud that have varying degrees of usage. All the applications run 24/7 and incur charges, no matter how much their users visit them. They desired to optimize costs with an automated mechanism to determine when an application should be shut down after a period of consistently low user interaction. As the engineer previously dedicated to architecting and building the pipeline that automates the provisioning of all resources for these applications, I was eager to add new features that would optimize my design for cost and efficiency while keeping it simple.
The Solution
To address this challenge, I implemented a system to automatically track the number of requests received by the application’s server and shut it off after 24 hours of no user activity. AWS offers many services that can work together to implement an auto-hibernation feature for applications. My strategy focuses on simplicity and is comprised of resources such as CloudWatch metrics, CloudWatch alarms, and EventBridge events, which work together to monitor application traffic, and a Lambda function to stop applications running on Amazon Elastic Container Service (ECS) from continuing to rack up unnecessary charges.
First, the ECS task emits CloudWatch metrics containing information about the request count it has received from user activity on an hourly basis. If the metric receives zero network traffic every hour over a consecutive 24-hour period, the CloudWatch alarm will toggle into ALARM state.
The CloudWatch alarm in ALARM state will then trigger EventBridge to emit an event with a Lambda function configured as the event target. The event’s details contain information, such as the ECS service’s name, which is used to identify the specific application that needs to be shut off.
Finally, the Lambda function gets invoked by the EventBridge event, extracts the ECS service name provided in the event’s details, and uses the AWS CLI to update the service’s desired count to zero.
The Results
My client has a more robust infrastructure that is more adaptive to changes and realized cost savings immediately after the implementation and deployment of the auto-hibernation feature. I also learned more about automating communication between distributed systems to optimize costs.
Next Steps
My solution could be taken a step further by implementing an auto-awaken feature. Right now, there is no graceful way to handle it if a user attempts to visit a site that is currently in hibernation. This results in a poor experience as the user would not be served the content requested. However, my client's existing infrastructure could be utilized in the architecture of the auto-awaken feature.
One approach includes updating the application’s dedicated CloudFront viewer-response function to automatically handle a request that failed due to the application server not being on to respond to requests. If a request is made but the application server does not respond, a Lambda could be invoked to use the AWS CLI to update the service’s desired count to be greater than zero.
There are a few other designs that come to mind when reasoning the next steps for this project, but no matter what the challenge is, my goals are to keep my solutions simple, automated, and cost-effective.