Owen Morris

Building, developing... and growing.

By day I'm the Ops Director for Doherty Associates, but I love diving into the tech and can't stop programming.

This is my outlet covering the things that interest me technically in Cloud, Data and Dev but are far too in-depth for corporate use!

Occasional references to being husband and Dad, very rarely runner.

Giving Pulumi a Spin

26/02/2023

I had a need to spin up some multi-tenant Azure infrastructure recently for a proof of concept. This required similar but differing deployments, with frequently changing infrastructure components, based on a self-service model. A goal was to have a central solution, deploying to multiple tenants. This was an interesting design challenge!

My requirements were:

  • Create a standard set of infrastructure that didn't vary between deployments
  • Add multiple specialised resources that can vary between deployments with differing configurations
  • The deployment process should handle the removal or addition of the variable resources if they are not present compared to the particular deployment.

Most of the time when using infrastructure-as-code (IAC) techniques to build infrastructure, the infrastructure deployment artefacts are often kept in source control and deployment can use continuous delivery (CD) techniques to deploy the infrastructure. In these scenarios the infrastructure is relatively static and not deployed that frequently. In my scenario, the deployment could happen many times an hour during testing. In addition, the multi-tenant nature made it difficult to automate the deployment method, as each deployment needed to be a different tenant ID. I needed a data driven approach to generating the deployment artefacts.

flowchart LR A[Data Source] B[Generic Deployment Artefact] C[Per-Tenant Artefact] A-->C B-->C

I was struggling to think of a good way to do this using Azure DevOps and Bicep or ARM templates. Using text templating seemed like a potential option (e.g. Liquid templates), but seemed quite brittle. The flow to the backend would be feasible as part of a deployment pipeline, but updating the data source would likely be fairly manual.

flowchart LR A[tenantdata.csv] B[template.liquid] C[tenant1.json] D[tenant2.json] E[tenantn.json] A-->C B-->C B-->D B-->E A-->D A-->E

I wanted a more automated and simple process. I had good sucess doing similar work previously using Farmer (a F# system that builds out ARM templates using F# computation expressions), but it does require teaching people to use F#.

I remembered a couple of articles I'd read recently about Pulumi and thought that this might be a good fit due to it's use of code to define resources; this would give me a chance to handle the deployment differently based on some incoming parameters.

Getting started with Pulumi and the build

I started with installing the CLI using the instructions, then set about building out my infrastructure as a class in C#, following the tutorial. To build out a Pulumi deployment you create a class in C#, inheriting from the Stack class and build out the deployment as part of the constructor.

One of the great things about using a programmatic deployment model is that you can create a different deployment using external inputs. I used this to build a stack that contained a consistently named set of base resources and then a set of resources created on input data held in a different data source. After building this out I had my target deployment. I was able to put the details of the variable resources into the data store and then run the CLI to create those resources on demand.

Deploy on demand

The next part of my build needed to be running the deployment on demand. My normal preference would be to run as much as possible from CI/CD, so I investigated using Azure DevOps to perform the build (perhaps initiated from a webhook), but because I wanted the deploy to be self-service, I decided against this approach. A CI/CD initiated build can be slow to start due to the need to acquire a worker and deploy a container. There would be ways to do it by using a self-hosted runner, but this would be quite expensive. I also ruled out using Azure Functions for this as it was possible that the deploy would not finish in the maximum function duration (maximum 10 minutes).

One thing that I've done successfully in the past is to deploy a .NET worker service and decouple the API written in Azure Functions from the back-end service, communicating via a queue. This seemed like a promising approach. This has become much easier since .NET Core 3.1 came with a worker service template that previously needed a bit of self-assembly, which I'd previously used for other solutions. Microsoft also recently released the Azure Container Apps service - deploying one of the workers there allows the code to run serverlessly, spinning up when a queue message enters the queue. This functionality is enabled by the use of Keda scaling provided by the container apps service.

Using the automation API

Switching to a worker meant running the Pulumi deployment from code rather than by using the Pulumi CLI. Pulumi provide an "Automation API" to do this, which was relatively straightforward to get running. I generated an API token and then accessed this using .NET configuration injected into my worker class. I followed the Inline Program example from the Automation API examples. Once this was in place, I integrated waiting and pulling messages from an Azure storage queue into the worker and used this and querying my data store to build out the deployment resources. Once done, I built out a simple endpoint in my Azure Functions project to drop messages into the queue. I then built out a Dockerfile. In order to get the worker to run I had to install the Pulumi CLI as part of the Docker image. I tested this locally and then and pushed the container app to Azure.

The docker additions looks like:

# any setup here
RUN apt-get update && apt-get install -y \
curl
RUN curl -fsSL https://get.pulumi.com | sh
ENV PATH=/root/.pulumi/bin:/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin
# your entrypoint here

The final architecture looks like this:

flowchart TB U[User] A[API] B[Storage Queue] C[Container] D[CosmosDB] AZ[Azure] U -- Submits Deployment --> A A -- Adds Message to Queue --> B B -- Pulls Message --> C C -- Queries data store --> D C -- Performs deployment --> AZ

I was pleased with how effective Pulumi was to create this integration and would look forward to using it again in the future. Using the background worker in conjunction with the Function App is an useful pattern to create decoupled services and container apps makes this pattern really easy to adopt. Both services allow 'scale to zero', so this type of application can be run very cost-effectively.

A serverless URL shortener - part 2

25/11/2022

Following on from my previous post, I had a cloud infrastructure spun up for my serverless URL shortener POC running in Azure Static Web apps, Azure Functions, storage and Front Door that was functionally complete. The next part was to validate that I'd met the non-functional requirements around the desired request throughput, and to find out how cost-effective this solution is.

Load Testing

I wanted to use the relatively new Azure Load Testing service for the testing, which I hadn't used before. It uses Apache JMeter scripts run in the cloud as it's approach. So first, I needed to set up a test script to be run. The script needed to be relatively representative of an actual load. I decided I would test an overall flow of 40 req/s split as:

  • 95% to the URL mapping function, simulating a normal user getting good responses
  • 5% to the admin API, targeting an URL that wasn't present (this would simulate bad responses that hit the database to find a non-existent url and would also simulate admin use).

I switched off the cache headers in Static Web Apps so I could test the non-cached performance (ie every request will hit API and storage).

I then ran the JMeter test and looked at log analytics to see the number of instances that the web app scaled up to, which was 4 within a minute of the test starting. The majority of the requests were in the 150 - 200ms range, but there were several of multiple seconds, due to the serverless cold start.

Uncached instances

I ran the costs through the Azure calculator. Based on the Static Web app, Functions cost and the storage costs, the worst case cost for the solution, running at this load continuously would be:

Azure SKU Cost
Static Web App £7.77
Azure Functions £133.84
Bandwidth £17.27
Storage £3.15
Total £162.03

As you can see, the compute is the largest component of the cost. Still, for a 100 million hits per month, it isn't bad at all.

Caching all the things

I then changed the static web app config so that the endpoint that did the URL shortening was cached. In theory, this meant that 95% of the traffic will be cached by Azure Front Door. I reran the load-test.

The cache hit ratio achieved by front-door was about 70%, which surprised me, as I expected it to be higher. For this second run the average response time fell to 90-120ms for cached results and the functions instances scaled to 1 instance, with the number of requests tracking around 30% or of the number of executions overall.

Cached Instances

Cache Hit rate

The costs for this solution were roughly 50% of the original cost. However, as the majority of the front-door cost is fixed, there would be an inflexion point when the cost of compute (at a much lower request per second rate) would become cheaper than the cost of the front-door. This is why Azure architecture is more than an art than a science!

Azure SKU Cost
Front Door £32.31
Static Web App £7.77
Azure Functions £39.89
Bandwidth £1.73
Storage £10.30
Total £91.99

What did I learn?

This exercise was a great learning experience. It proved to me that serverless approaches are both simpler in terms of the developer experience and can be cost-effective. Caching responses up-front made a massive difference. The best request is one you don't have to serve at all!

I was pleased with the choice of table storage in terms of speed and ease of use; if the data model and query needs can support it, it would probably be one of my first choices for data storage, along with CosmosDB, reaching for Azure SQL only if I needed it. The load testing service is effective.

On the negative side, cold starts can be a problem - using .NET functions, the response time could be up to 4 seconds; even with a pre-warmed instance, people will see slow responses occasionally as load scales up and down. There is some overhead in the Functions runtime - I would be interested to see if porting this to another language or using a container app and other framework would reduce the response time.

Finally, I would look to see if using another layer of caching at the app level would help, perhaps adding a Redis cache to store hot responses and hitting memory rather than storage. Perhaps another post will result!