There are many ways to implement auto-scaling in a Kubernetes cluster. The most common one is to configure HPA, which uses out of the box metrics from the cluster’s hardware, (CPU, RAM, and even GPU) then scale up/down according to a given threshold. But what if you want to scale per HTTP request? I spent too much time on implementing an effective solution, so I decided to make an informative article that will guide you through the process.
Let’s start with the tools we are going to use: