The Horizontal Pod Autoscaling (HPA) system offers a smart solution, enabling automatic extension of the Pod quantity based on CPU usage or an application's custom metrics. It seamlessly supports replication controllers, deployments and replica sets.
Monitor managers survey the resource usage of the metrics every 15 seconds (adjustable via --horizontal-pod-autoscaler-sync-period)
It can work with three types of metrics:
Predefined metrics (like Pod's CPU) are calculated as a ratio or usage rate
Custom Pod metrics are calculated as raw value amounts
Custom object metrics
Metrics can be retrieved using Heapster or the customized REST API
It is capable of managing multiple metrics
Do note that the extent of our discussion here is limited to Pod's automatic scaling; to comprehend Node's automatic scaling, refer to Cluster AutoScaler. Before using the HPA, further, it is necessary to ensure that the metrics-server is properly deployed.
API Version Comparison Table
Kubernetes Version
Autoscaling API Version
Supported Metrics
v1.5+
autoscaling/v1
CPU
v1.6+
autoscaling/v2beta1
Memory and Custom
Examples
# This segment demonstrates how to create a pod and service$kubectlrunphp-apache--image=k8s.gcr.io/hpa-example--requests=cpu=200m--expose--port=80service"php-apache"createddeployment"php-apache"created# Here, we create the autoscaler$kubectlautoscaledeploymentphp-apache--cpu-percent=50--min=1--max=10deployment"php-apache"autoscaled...
The snippet above walks you through an example; from creating a pod and service, generating an autoscaler, increasing loads to finally witnessing the reduction of load and automatic reduction of pod quantity. This offers an illustrative explanation of how the autoscaling functions.