Skip to main content
🎊 We've changed our name from Ddosify to Anteon! 🚀
How to Implement Kubernetes Horizontal Pod Autoscaling for Scalable Applications

How to Implement Kubernetes Horizontal Pod Autoscaling for Scalable Applications

The ability to scale applications efficiently in response to varying loads (i.e. via the increase in website traffic) is crucial for maintaining performance and optimizing resource use. In this post, we are going to talk about Kubernetes autoscaling with Horizontal Pod Autoscaler (HPA).

The Horizontal Pod Autoscaler (HPA) enables Kubernetes Clusters to automatically adjust the number of running Pods based on observed CPU utilization or other selected metrics. This capability not only ensures that applications can handle sudden traffic spikes without manual intervention but also reduces costs by scaling down during low-usage periods.

The HPA is not just about handling peak loads; it's about making sure that your resources are always right-sized, striking the perfect balance between performance and cost.

We will assume that you already know the basic concepts of Kubernetes. If you don’t, check out the “Kubernetes 101 - Introduction & Architecture” blog post.

In the following sections, we will

  • Understand the mechanics of Horizontal Pod Autoscaling and how it works
  • Move on to using it in a real example

Let’s dive in!

What is Horizontal Pod Autoscaling (HPA)?

Horizontal Pod Autoscaling enables a Kubernetes Cluster to dynamically adjust the number of Pods for a given resource. While vertical scaling would assign more resources to a Pod, horizontal scaling works by assigning extra Pods to a resource. HPAs can be managed and queried through the kubectl API.

To effectively use HPA, you need to:

  • Decide on the metrics that will trigger Kubernetes scaling actions. These could include CPU or memory usage, among others. Options include using:
    • Default metrics from the Kubernetes Metrics Server.
    • Custom metrics, which require exporting them to the Kubernetes Metrics Server via the Custom Metrics API or the External Metrics API. For example, these could be the number of messages in a RabbitMQ queue, Prometheus metrics, etc. More on defining and using custom metrics can be found here.
  • Remember, HPA can be applied only to scalable objects, this excludes DaemonSets.
  • Ensure Pods have the necessary resource requests set, as HPA uses these to determine Kubernetes scaling actions. For more on resource requests, see the Kubernetes documentation.
Kubernetes Horizontal Pod Autoscaling Example Image

How does Horizontal Pod Autoscaling work?

In a sense, we can think of the work of an HPA as the following:

  • It periodically (every 15 seconds as default) queries the resource usage of its target object
  • Calculates the average resource usage (for example, it averages all of the Pods of a Deployment)
    • You can check here to learn more about the calculation logic.
  • Then, depending on the given threshold and the average usage, it calculates the number of desired Pods. This number could be smaller or larger than the current number of Pods. Then, Kubernetes applies this change to the target object to scale it up/down.
  • Now that we have a basic idea on Kubernetes Horizontal Pod Autoscaling, let’s try it ourselves.

Demo: Kubernetes Horizontal Pod Autoscaler Example

In the demo, we will:

  • Create a Kubernetes Cluster on Minikube
  • Configure it and install the required services
  • Set up HPA and test it


We will use a macOS operating system. However, this only affects the installation of Minikube and you can find a link for installing it on different operating systems in the relevant section.

  • Kubectl
    • Required to manage the Kubernetes Cluster
  • Homebrew (if on macOS)
    • Required to install Minikube
    • To see how you can install it, you can check its website.

1. Creating a Kubernetes Cluster

First of all, we need to create a local demo Kubernetes Cluster to test Horizontal Pod Autoscaling (HPA). For this purpose, we should first install Minikube and then set up a Cluster.

1.1 Installing Minikube

The way you install Minikube depends on your operating system. Here, we will use the ARM64 macOS architecture. For more information about the installation instructions for other operating systems, you can check the first section (Installation) here.

Here, we will use the Homebrew installer. To do so, execute this:

brew install minikube

With this, we can move on to setting up a Kubernetes Cluster.

1.2 Setting up a Kubernetes Cluster on Minikube

Execute this:

minikube start

This will take some time. After a while, you should expect an output like this:

Minikube start on terminal

And that’s all! Now we have a Kubernetes Cluster running in Minikube.

This command will also set the default context of kubectl to this newly created Cluster. This means that you do not have to configure kubectl to access this Cluster. However, if you want to access your other Clusters through kubectl, you need to use kubectl use-context <context_name> to switch to the appropriate context before executing any other commands.

2. Configuring the Cluster

2.1 Installing Kubernetes Metrics Server

Kubernetes Metrics Server allows a Kubernetes Cluster to know about its resource usage. Since we will use the Pod metrics (CPU usage, to be more specific) to determine on how and when to scale the Pods, we need to install it. To do so, we will use kubectl:

kubectl apply -f

This is the output you should expect:

Metrics server install terminal output

Running this command applies the metrics server configuration YAML in the given URL and installs it in your cluster. After this, you can verify its status:

kubectl get deployment metrics-server -n kube-system

Here, we will see this:

Mertics server failed terminal output

We see that even though it was deployed 3 minutes 43 seconds ago, metrics-server still hasn’t been up yet. We can check its logs with:

kubectl logs deployment/metrics-server -n kube-system

2.2 Fixing the IP SANs Problem

Here, the important thing is the error:

E0306 17:19:41.746695       1 scraper.go:149] "Failed to scrape node" err="Get \"\": tls: failed to verify certificate: x509: cannot validate certificate for because it doesn't contain any IP SANs" node="minikube"

This tells us that there has been a certificate validation problem between the node and the Kubelet API. For more information about certificates in Kubernetes, you can check the docs. To handle this issue, we will follow the solution given in this blog post.

According to the blog post, we need to add the config serverTLSBootstrap: true to the ConfigMap named kubelet-config and all nodes of our cluster (we have only one since we are using Minikube).

2.2.1 Editing the kubelet-config

To edit the ConfigMap, execute this:

kubectl edit configmap kubelet-config -n kube-system

This will take your terminal window to the configuration file. Here, add the line serverTLSBootstrap: true under the line kind: KubeletConfiguration. It should look like this:

    kind: KubeletConfiguration
    serverTLSBootstrap: true

Then save the file and exit.

2.2.2 Editing the Kubelet Config On the Kubernetes Node

First, we need to connect to the minikube’s virtual machine (VM) node:

minikube ssh

Then open the config file located in /var/lib/kubelet/config.yaml with your preferred text editor.

sudo vi /var/lib/kubelet/config.yaml

Here, do the same as the step 2.2.1. You should have a section like this:

    kind: KubeletConfiguration
    serverTLSBootstrap: true

Then restart the kubelet service:

sudo systemctl restart kubelet

Exit the ssh and return to your host. Now restart the metrics-server:

kubectl rollout restart deployment metrics-server -n kube-system

Then we need to sign the certificates. Run this:

kubectl get csr

You will see this. On the right, you can see that the last certificate is waiting for approval. Find its name (csr-t2pkd in this case):

terminal output for kubectl get csr

Run the command

kubectl certificate approve <csr_name>

# In our case, the command is
# kubectl certificate approve csr-t2pkd

Restart the metrics-server again

kubectl rollout restart deployment metrics-server -n kube-system

After a while, you can check the status of the metrics-server using the command:

kubectl get deployment metrics-server -n kube-system

If you see this, it means that the server works correctly.

metric server success terminal output

2.3 Installing the Demo App

Now we need a sample application to deploy on Kubernetes to test the scaling. For this purpose, we can deploy a basic Django app containing an endpoint (with a URL http://testserver-service:8200/computation?fib=5) that recursively calculates the fibonacci number of the given query parameter (5 in this case). To do so, execute

kubectl apply -f

This is the output you expect to see:

sample app download output terminal

Let’s also check the status of the application:

kubectl get deployment testserver-deployment -n testserver

kubectl command terminal output

As you can see, the deployment has successfully started up.

3. Horizontal Pod Autoscaling (HPA)

Now that we have our cluster and apps set up, we can finally move on to testing the Horizontal Pod Autoscaling (HPA).

3.1 Setting the HPA Up

For the thresholds, we will use the following:

  • CPU Percent: 50 (this means that we want our Pods to use at most 50% of their requested CPU’s. If this threshold is exceeded, HPA will assign new Pods to our deployment)
  • Min Pods: 1
  • Max Pods: 10

Execute this:

kubectl autoscale deployment testserver-deployment -n testserver --cpu-percent=50 --min=1 --max=10

Note: It is important to set the namespace of the HPA the same as the namespace of the deployment that we want to scale.

You will see this

autoscale apply result terminal outout

Let’s also check its status

kubectl get hpa -n testserver

Here, we see that the Horizontal Pod Autoscaler has successfully been set up.

autoscaler deployed terminal output

3.2 Verifying the HPA

Let’s generate a load on the app to see if it is scaled or not. For this purpose, we can execute this:

kubectl run -i \
    --tty load-generator \
    --namespace testserver \
    --rm --image=busybox \
    --restart=Never \
    -- /bin/sh -c "while sleep 0.1; do wget -q -O- http://testserver-service:8200/computation?fib=5; done"

Let’s first see what this command does:

  • kubectl run: Runs an image on the Kubernetes cluster
  • -i —tty : Runs the image on an interactive terminal.
  • load-generator: Assign the name load-generator to the Pod we are starting up
  • —namespace testserver: Sets the namespace of the Pod as testserver. This is useful as it allows the Pod to refer to the service of our app by only using its name.
  • -rm: Removes the Pod once it finishes execution
  • —image=busybox: Specifies the image of the Pod. In this case, BusyBox is a lightweight Unix distribution.
  • —restart=Never: Prevents this Pod from being restarted once it is deleted
  • -- /bin/sh -c "while sleep 0.1; do wget -q -O- http://testserver-service:8200/computation?fib=5; done": Runs a bash command that sends a request to our computation endpoint in an infinite loop every 0.1 seconds. You can change fib=5 to any other value to increase/decrease the load.

Before running the command, verify that we have a single Pod hosting our app

kubectl get deployment testserver-deployment -n testserver

single pod terminal output

Now let’s run the command. You will see this:

kubernetes load generation terminal output

After waiting for a while, go ahead and run this to check the status of our test app:

kubectl get deployment testserver-deployment -n testserver

You can see that the number of Pods is increased to 2 due to the load.

kubectl command terminal output

You can experiment with the sleep duration and the fibonacci query parameter to see how the number of Pods change. You can also run this command to see the status of the HPA:

kubectl get hpa -n testserver

Here, the below image tells us that even though there are 2 replicas, the average CPU usage (65%) exceeded our threshold of 50%. So we can expect new Pods to appear. After waiting a while and re-executing the command, we can see that we now have 3 Pods.

HPA Threshold Exceeded with 3 Pods

Now go ahead to the terminal where you started the load generator and stop its execution. After waiting a while, you can see that the number of Pods is reduced to 1, the minimum value we specified in the HPA.

Scaled Down App Terminal Output


As you can see, the Kubernetes Horizontal Pod Autoscaler (HPA) stands as a testament to the platform's ability to provide robust, scalable solutions for managing containerized applications.

Through its intelligent monitoring and automated Kubernetes scaling capabilities, HPA ensures that applications can effortlessly adapt to changing demands, maintaining optimal performance while efficiently utilizing resources.

By aligning resource allocation with actual demand, HPA not only maximizes application performance but also contributes to a more sustainable and cost-effective use of computing resources. And with the Kubernetes autoscaling example in this post, we demonstrated how you can make use of HPA.

If you want to learn more about the intricacies of HPA in Kubernetes, you can check here.

Until the next blog post, happy monitoring!

Related Blogs