Sela. | Cloud Better.

Low-latency serverless control planes for hybrid-tenant SaaS developer platforms

The Challenge 

The Dittofi team describes their product as a visual platform that empowers product managers, citizen developers, and software engineers to build full-stack, enterprise-grade web apps collaboratively without coding. Applications built on the Dittofi platform are transformed into performant Go backends and React frontends by Dittofi’s code-generation engine.  

In the product’s earliest iteration, applications built with Dittofi were deployed to the same server that hosted Dittofi’s platform itself. Despite its powerful and expensive server, the platform became unstable and failed several times as the team approached their hundredth deployed application.  

In response, the Dittofi team paused new application deployments while they looked to quickly bring a new platform online. This new platform would need to scale horizontally as needed and be resilient to hardware failures. These resiliency and elasticity goals needed to be achieved while ensuring the platform remained performant and cost-effective.  

The Dittofi team engaged the Foresight Technologies team to help it achieve these goals.  

In this blog post, we will discuss the hybrid-tenant architecture that our team at Foresight designed and implemented for Dittofi's hosting platform. We will focus on the underlying architecture of the Dittofi hosting platform where tenant workloads run, as well as the architecture for the hybrid-tenant control plane that enables the provisioning, de-provisioning and updating of tenants. We will pay special attention to the user experience during the development of low-code applications and the measures we put in place to enable a near-real-time code generation, building, and deployment cycle when Dittofi customers update their applications. 

Hybrid-Tenant Architecture 

Hybrid-tenant cloud architecture is a way of designing software applications that serve multiple tenants with different levels of isolation and resource allocation. It leverages a mixture of components that can be single-tenant or multi-tenant. Unlike single-tenancy or multi-tenancy architectures, where each tenant has either a dedicated or a shared instance of an application, hybrid-tenancy architectures offer more flexibility and cost savings.  

With an effective hybrid-tenant architecture in place, platforms can efficiently serve hundreds of low-code applications while maintaining a high level of isolation and cost-effective resource allocation for each tenant. 

At A High Level 

The following diagram represents a high-level view of the Dittofi hosting platform. Because this is a relatively technical solution, please refer to it as you read through the details ahead:

 

Single-Tenant Components 

When designing Dittofi’s hosting platform, we wanted to isolate tenant applications to prevent resource contention issues and enable per-application autoscaling. Each tenant application is therefore deployed to Dittofi's hosting platform with a set of logically, and sometimes physically dedicated resources and components. 

Infrastructure components that are unique per tenant application include: 

  • An ECS service that serves the tenant application’s dynamic backend 
  • An S3 bucket for static assets 
  • A CloudFront distribution with two origins, one for the static assets, and one that serves dynamic content. 
  • Two Cloudfront functions associated with the CloudFront distribution to manipulate request and response headers 
  • Some DNS records 
  • An optional ACM certificate for a tenant-specific domain name. 

Each tenant’s ECS service auto-scales independently, and each tenant application can be updated without impacting any other application. 

In addition to a set of unique infrastructure components, tenant applications are each provisioned with a tenant-specific database and database user, along with a user for Redis with a set of privileges that allow access to a tenant-specific key prefix. 

Physical and logical components are all created using a serverless control plane that we describe in the section below entitled Serverless Control Plane. 

Shared Platform Infrastructure Components 

Aside from the single-tenant components that are unique per customer, we designed a set of shared platform infrastructure components that are used by all tenant applications designed to reduce the operational footprint and keep costs down.  

These components include: 

  1.  An ECS cluster running on a fleet of autoscaling EC2 instances onto which tenant applications are binpacked 
  1. A serverless RDS cluster onto which logical databases are added to as each tenant application is created 
  1. An Elasticache Redis cluster which serves as a low latency cache for tenant applications that each have access to a key prefix 
  1. An EFS volume for compiling and serving generated code 
  1. A shared application load balancer pointing to a Caddy reverse proxy. 
  1. A Caddy reverse proxy. 

Using the Caddy reverse proxy enables us to make Dittofi hosting much cheaper by sharing a single application load balancer across hundreds of tenant applications despite ALB listener limits. The reverse proxy’s function is to translate tenant-specific headers injected by CloudFront to a tenant application’s unique SRV DNS record. After finding where the upstream service is running, Caddy forwards requests upstream.  

Serverless Control Plane 

When managing hundreds of tenants, having a single control plane that provides a set of operations necessary to manage the lifecycle of each tenant becomes incredibly important. We designed Dittofi’s control plane to run on AWS’s serverless offerings, allowing costs to scale as customers adopt the platform. 

The serverless control plane comprises three Step Functions state machines in conjunction with a control plane frontend to enable tenant application management.  

The first state machine assembles the necessary infrastructure configuration based on the requester’s preferences. It subsequently creates or updates tenant infrastructure using Pulumi. Finally, it invokes another state machine. 

This second state machine generates code, compiles the code, and executes tenant-specific database migrations. This state machine is used when tenants are being created. Still, it is also used to enable low-latency code deployments whenever a tenant application creator wants application configuration changes to reflect on their running application—more on this in the next section. 

We built one final state machine to de-provision tenants. 

The infrastructure provisioning state machine is illustrated in this image: 

 

The code generation state machine is illustrated in this image: 

 

Achieving Near-Real-Time Performance 

To achieve near-real-time performance during the code generation and deployment process, we needed to bypass the slow ECS service deployment process. For users to be able to build and test changes, the experience needed to be far snappier. 

The solution we arrived at was to prepare running containers ahead of time that are orchestrated by a tenant’s ECS service. These containers include platform dependencies, but the generated applications are externalized to a mounted EFS volume storing the compiled application binary. In the foreground of the running containers, we run an application reloader process we designed to poll the EFS file system for updates to the application binary every second. If an update is detected, the application reloader process will kill the old process, and replace it with a process that runs the detected new binary. 

But how does the application binary arrive on the EFS volume, you ask? Cue our control plane. The lambda function compiles code to the same tenant-specific location on the EFS volume at which the application reloader is listening for changes.  

With this architecture, we are also able to health check the container runtime separately from the tenant application health check. This enables us to easily know whether applications have simply been misconfigured or the runtime is unhealthy.  

Without needing to pull, stop or start docker containers, low-code application developers can quickly make changes to their low-code applications and see the changes deployed in near-real-time. 

Conclusion 

Ensuring systems are reliable, cost-effective, easy to operate, secure, and performant is what the AWS Well-Architected Framework is all about. The Dittofi hosting platform had tight constraints around each of these attributes and gave our team a chance to exercise our architectural muscles to arrive at a simple and stable solution built on top of AWS-managed services and open-source solutions. 

One key takeaway for readers is that when designing systems that are designed to serve multiple tenants, it is important to think about how to partition tenant data and resources to ensure security, reliability, and performance for each tenant. It is also important to understand how increasing your infrastructure footprint costs more money and results in additional operational load on your teams. Hybrid-tenant architectures are key to successfully balancing the operational footprint and cost with the security, reliability, and performance concerns of your systems.  

A second key takeaway is not to get discouraged when managed services present constraints, as was the case when ECS service deployment was too slow to propagate low-code updates to live environments. With a little ingenuity, you can often inherit the benefits of these managed solutions while working around constraints. 

A final takeaway: Using managed services and keeping architecture as simple as possible are both essential to building maintainable solutions. AWS can reduce your operational footprint significantly when you use managed solutions. When you use serverless solutions, the amount of operational work left for your team is further reduced. Plus, you only pay for the resources you use. 

Foresight Technologies is an AWS Well-architected Partner, and so whether you're looking to implement a similar hybrid-tenant architecture for your solution, want us to help plan and implement your cloud architecture and roadmap, or have other AWS operational or development challenges, please reach out!