The easiest way to implement Auto-scaling per HTTP request in your AKS cluster


There are many ways to implement auto-scaling in a Kubernetes cluster. The most common one is to configure HPA, which uses out of the box metrics from the cluster’s hardware, (CPU, RAM, and even GPU) then scale up/down according to a given threshold. But what if you want to scale per HTTP request? I spent too much time on implementing an effective solution, so I decided to make an informative article that will guide you through the process.

Let’s start with the tools we are going to use:

  1. KEDA is a Kubernetes-based Event Driven Auto-scaler. With KEDA, you can drive the scaling of any container in Kubernetes based on the number of events needing to be processed.

In this scenario I’m going to use both of API Management and Application insights solutions (despite the fact they are optional), but keep in mind you may apply the same idea only with KEDA and Log Analytics Workspace.

Let’s take a look in a diagram that shows how the architecture works:

Architecture scheme

Explanation - Once a client hits the API management, two processes begin in parallel:

  1. Request is being processed (the upper section)


Keda reads the data from log analytics workspace and scales the Kubernetes deployment based on a given query.

Armed with this knowledge, let’s dive straight into the details.


Assuming you already have the resources mentioned above, let’s point out the steps of the implementation:

  1. Install KEDA in your cluster.

Now let’s start working!

1st step — Installing KEDA

You may use KEDA official deployment documentation, or just follow the steps below:

A. Install helm in your machine.

B. Install KEDA (using helm) by running the following commands in your cluster:

helm repo add kedacore
helm repo update
kubectl create namespace keda
helm install keda kedacore/keda --namespace keda

By doing this, we basically created a custom k8s resource called, in this resource we will have three k8s objects (YAML files), those files will eventually produce a custom HPA that will be responsible for scaling our deployment (application):

  • ScaledObject.yaml

We will configure those three later, don’t worry, now let’s proceed to step number two.

2nd step — Send data from APIM to Application insights

A. Go to your API management resource in Azure and choose one API.

B. Go to ‘Settings’ tab an scroll down to the bottom.

C. If you already have application insights configured for this API management, go to ‘Diagnostics Logs’ and enable application insights. Otherwise, click on ‘Manage’ and add it to your API management resource.

After doing this, you should see something similar to this:

Screenshot from API Management

D. Go to your Application Insights and verify your data is there, you have two options to do this:

  • Go to ‘Performance’ tab and see your APIs performance.
Screenshot from Application Inisghts

3rd step — Send data from Application insights to Log Analytics

A. Go to ‘Properties’ tab in Application Insights.

B. In ‘Workspace’ property — click on ‘Migrate to Workspace-based

C. Choose the Log Analytics Workspace under the relevant subscription:

D. Verify you see the data in by running the following query in the Log Analytics Workspace:

|where TimeGenerated > ago(24h)
|limit 100

If you want to look for a specific operation, add this to your query:

|where Name == "POST /example-service/v1/abcd

Example output for a random query:

Now that we have all the data in Log Analytics Workspace, we can use Keda to query this resource.

4th step — Configure KEDA to work with Azure resources

The scaler we chose is Log Analytics Workspace (obviously). Since most of Azure users make use of this resource, we are going to exploit its usage to the maximum.

Again, I’m referring you to KEDA documentation about Azure Log Analytics scaler in case more details are required.

This step is the most complicated one, so stay focused.

A. First, we’ll need to create a service-principal for authentication purposes, if you already have one, you can just assign it as a contributor to your Log Analytics Workspace, if not, you can simply run this command in PowerShell:

az ad sp create-for-rbac --name ServicePrincipalName

Note: Save the password in a notepad so you won’t lose it! it’s impossible to recover it afterwards.

B. Preparing the Secret.yaml

  • In the secret file, you should insert your service-principal values
apiVersion: v1
kind: Secret
name: kedaloganalytics
namespace: application-namespace
app: kedaloganalytics
type: Opaque
tenantId: "QVpVUkVfQURfVEVOQU5UX0lE" #Base64 encoded Azure Active Directory tenant id
clientId: "U0VSVklDRV9QUklOQ0lQQUxfQ0xJRU5UX0lE" #Base64 encoded Application id from your Azure AD Application/service principal
clientSecret: "U0VSVklDRV9QUklOQ0lQQUxfUEFTU1dPUkQ=" #Base64 encoded Password from your Azure AD Application/service principal
workspaceId: "TE9HX0FOQUxZVElDU19XT1JLU1BBQ0VfSUQ=" #Base64 encoded Log Analytics workspace id

C. Preparing TriggerAuthentication.yaml

  • This file is using the values from the Secret.yaml file.
kind: TriggerAuthentication
name: trigger-auth-kedaloganalytics
namespace: application-namespace
- parameter: tenantId
name: kedaloganalytics
key: tenantId
- parameter: clientId
name: kedaloganalytics
key: clientId
- parameter: clientSecret
name: kedaloganalytics
key: clientSecret
- parameter: workspaceId
name: kedaloganalytics
key: workspaceId

D. Preparing the ScaledObject.yaml

  • The highlighted values should be changed
kind: ScaledObject
name: kedaloganalytics-consumer-scaled-object
namespace: application-namespace
deploymentName: kedaloganalytics-consumer
kind: Deployment
name: deployment-name
pollingInterval: 30
cooldownPeriod: 30
minReplicaCount: 1
maxReplicaCount: 10
- type: azure-log-analytics
workspaceId: "81963c40-af2e-47cd-8e72-3002e08aa2af"
query: |
let AvgDuration = ago(5m);
let ThresholdCoefficient = 0.8;
| where Name == "POST /example-service/v1/abcde"
| where TimeGenerated > AvgDuration
| summarize MetricValue = count()
| project MetricValue, Threshold = MetricValue * ThresholdCoefficient

threshold: "10"
name: trigger-auth-kedaloganalytics

E. Save the three YAML sections above in one file, and apply it to your cluster with the following command:

cd <YourWorkingDirectory>
kubectl apply -f <yourfile>.yaml

The ScaledObject.yaml has created a custom HPA based on your query, we will check that out in the next step.

5th step — Testing

A. First, let’s check the HPA by running the command below.

kubectl get hpa -n <your-application-namespace>

You should receive the following output:

If you have this resource in your cluster, it means that KEDA was deployed successfully.

B. Send some requests.

In order toverify that everything is really working, we need to send requests to our application of course. (I recommend to set the threshold value to be relatively low so the scaling process will start immediately)

Few seconds after the first request had been sent, you should already see the TARGETS count is starting to increase.

C. Finally, we can run the command

kubectl get pods -n <your-application-namespace>

As you can see, new pods are being created.

Now you can test your auto-scaling and adjust the query to be perfect.


As already said, this guide was written because I couldn’t find something similar to this over the internet, so I decided to make one by myself.

Feel free to contact me for any questions.

Hope you found it useful!

About me

I’m 26 y/o from Israel, working as DevOps Engineer in AU10TIX.

LinkedIn profile

Private email

I'm working as DevOps engineer in AU10TIX, passionate about technology and computers.