Kubernetes autoscaling is the key to maintaining performance while optimizing costs. However, with multiple tools available — HPA, VPA, KEDA, and Cluster Autoscaler — knowing when and how to combine them remains a challenge for many teams.
According to CNCF 2025 benchmarks, properly combining these tools can achieve 30-50% cost reduction compared to statically provisioned clusters. This guide deep-dives into each tool's strengths and provides practical combination patterns.
Kubo is a managed Kubernetes platform from ¥48,000/month (~$320/month) that simplifies autoscaling configuration and operations.
HPA (Horizontal Pod Autoscaler): The Scaling Foundation
How HPA Works
According to the Kubernetes official documentation, HPA automatically adjusts the number of pod replicas based on observed metrics. CloudPilot AI's comparison rates HPA as "best for stateless workloads where CPU or memory is the bottleneck."
Basic Configuration
apiVersion: autoscaling/v2
kind: HorizontalPodAutoscaler
metadata:
name: web-app-hpa
namespace: production
spec:
scaleTargetRef:
apiVersion: apps/v1
kind: Deployment
name: web-app
minReplicas: 2
maxReplicas: 20
metrics:
- type: Resource
resource:
name: cpu
target:
type: Utilization
averageUtilization: 70
- type: Resource
resource:
name: memory
target:
type: Utilization
averageUtilization: 80
behavior:
scaleDown:
stabilizationWindowSeconds: 300 # 5-minute stabilization
policies:
- type: Percent
value: 10
periodSeconds: 60 # Max 10% reduction per minute
scaleUp:
stabilizationWindowSeconds: 0 # Immediate scale-up
policies:
- type: Percent
value: 100
periodSeconds: 15 # Up to 100% increase in 15 seconds
Custom Metrics Scaling
Scale on Prometheus custom metrics beyond CPU and memory:
metrics:
- type: Pods
pods:
metric:
name: http_requests_per_second
target:
type: AverageValue
averageValue: "1000"
HPA Best Practices
- Set
minReplicasto at least 2 based on availability requirements - Configure
behaviorwith scale-down stabilization to prevent flapping - CPU utilization threshold of 70-80% is standard (see Sedai's guide)
VPA (Vertical Pod Autoscaler): Right-Sizing Resources
How VPA Works
According to Kubeify's analysis, VPA automatically adjusts pod CPU and memory requests and limits based on actual usage. It is best suited for stateful workloads, monoliths, and batch jobs with stable, predictable usage patterns.
Three Operating Modes
| Mode | Behavior | Recommended Use |
|---|---|---|
| Off | Generates recommendations without applying | Analysis and planning |
| Initial | Applies recommendations only at pod creation | Gradual adoption |
| Auto | Automatically applies recommendations (pod restart) | Stable workloads |
Visualization with Goldilocks
apiVersion: autoscaling.k8s.io/v1
kind: VerticalPodAutoscaler
metadata:
name: api-server-vpa
spec:
targetRef:
apiVersion: apps/v1
kind: Deployment
name: api-server
updatePolicy:
updateMode: "Off" # Recommendation only
resourcePolicy:
containerPolicies:
- containerName: api
minAllowed:
cpu: "100m"
memory: "128Mi"
maxAllowed:
cpu: "2"
memory: "4Gi"
Use Goldilocks alongside VPA to view namespace-wide resource recommendations in a dashboard.
Critical VPA Warning
VPA and HPA conflict: Do not use VPA in Auto mode alongside HPA scaling on CPU or memory. The two controllers will compete. The correct combination is "VPA (Off/Initial mode) + HPA (custom metrics)" or "replace HPA with KEDA entirely."
For Captain.AI workloads on Kubo, AI worker resource requirements fluctuate dynamically, making continuous right-sizing with VPA recommendations essential.
KEDA: Event-Driven Autoscaling
What Makes KEDA Different
KEDA is a Kubernetes-based event-driven autoscaler that enables scaling based on external event sources beyond what HPA alone can handle. As of version 2.19, it provides 70+ built-in scalers.
The Killer Feature: Scale to Zero
According to Spectro Cloud's analysis, when no events are present, KEDA can scale deployments to zero pods — a massive cost saver for development environments and sporadic batch processing.
Message Queue-Based Scaling
Scaling based on RabbitMQ queue depth:
apiVersion: keda.sh/v1alpha1
kind: ScaledObject
metadata:
name: worker-scaler
namespace: production
spec:
scaleTargetRef:
name: queue-worker
pollingInterval: 10 # Check every 10 seconds
cooldownPeriod: 60 # 60-second cooldown
minReplicaCount: 0 # Scale to Zero!
maxReplicaCount: 20
triggers:
- type: rabbitmq
metadata:
host: "amqp://rabbitmq.production.svc.cluster.local:5672"
queueName: tasks
queueLength: "5" # 1 pod per 5 queue messages
Prometheus Metrics Scaling
apiVersion: keda.sh/v1alpha1
kind: ScaledObject
metadata:
name: api-scaler
spec:
scaleTargetRef:
name: api-server
minReplicaCount: 1
maxReplicaCount: 50
triggers:
- type: prometheus
metadata:
serverAddress: http://prometheus.monitoring.svc.cluster.local:9090
metricName: http_request_duration_seconds_p99
threshold: "0.5" # Scale up when P99 latency exceeds 500ms
query: |
histogram_quantile(0.99, sum(rate(http_request_duration_seconds_bucket{service="api"}[5m])) by (le))
Cron-Based Scaling
For predictable traffic patterns:
triggers:
- type: cron
metadata:
timezone: Asia/Tokyo
start: "0 8 * * 1-5" # Scale up at 8 AM weekdays
end: "0 20 * * 1-5" # Scale down at 8 PM weekdays
desiredReplicas: "10"
Combination Patterns and Best Practices
Recommended Configurations
Based on DEV Community comparisons and Tasrie IT benchmarks:
| Workload Type | Recommended Setup |
|---|---|
| Web API (stateless) | HPA (CPU/custom metrics) + Cluster Autoscaler |
| Batch processing | KEDA (queue depth) + Scale to Zero |
| Database (stateful) | VPA (Auto) + manual node management |
| AI inference | KEDA (Prometheus metrics) + Karpenter |
| Development environments | KEDA (Cron) + Scale to Zero |
Karpenter Integration
Karpenter demonstrates 40% faster node provisioning than Cluster Autoscaler in CNCF 2025 benchmarks. Combining pod-level scaling (HPA/KEDA) with node-level scaling (Karpenter) delivers maximum efficiency.
Quantifying Cost Impact
| Optimization Approach | Expected Reduction |
|---|---|
| HPA only | 10-20% |
| HPA + VPA (Off mode analysis) | 20-30% |
| KEDA (Scale to Zero) | 30-50% (non-production) |
| All tools integrated + Karpenter | 30-50% (including production) |
With Kubo, you get transparent pricing from ¥48,000/month to fully leverage these autoscaling capabilities.
Conclusion: Choose the Right Tools for Cost and Performance
Kubernetes autoscaling achieves maximum impact not through a single tool, but through the right combination of multiple tools:
- hpa: the standard for cpu-memory-based horizontal scaling. best for stateless workloads
- VPA: Resource right-sizing. Start with Off mode for analysis
- KEDA: 70+ event source scalers and Scale to Zero capability
- Karpenter / Cluster Autoscaler: Node-level auto-provisioning
Kubo provides a managed Kubernetes environment from ¥48,000/month where all these autoscaling tools are ready to use. With Captain.AI integration, auto-scaling AI workloads becomes seamless.
For autoscaling design and implementation consulting, contact us.