0%·11 min left
Cloud Cost Optimization

Kubernetes VPA: Rightsize Pod Resource Requests

Alberto Grande

Alberto Grande

Head of Marketing

May 22, 202511 min read
Kubernetes VPA: Rightsize Pod Resource Requests

Kubernetes makes it easy to run workloads in containers, but setting the right CPU and memory requests is still a guessing game. Over-provisioned pods waste resources. Under-provisioned pods get throttled—or worse, evicted.

The Vertical Pod Autoscaler (VPA) helps solve this. It automatically adjusts CPU and memory requests and limits for your pods based on observed usage. Instead of scaling out like HPA, VPA scales up or down the resources each pod needs.

This guide is part of our autoscaling series and focuses specifically on how VPA works, when to use it, what limitations to watch for, and how to go beyond it with real-time, cost-aware optimization.

Kubernetes Autoscaling Series:

How Kubernetes VPA Works#

The Vertical Pod Autoscaler (VPA) continuously monitors the resource usage of your pods and recommends updated CPU and memory requests. Unlike the Horizontal Pod Autoscaler (HPA), VPA doesn’t add or remove pod replicas—it adjusts the size of each pod.

VPA is composed of three components, each handling a different part of the process:

ComponentRoleTriggers
RecommenderAnalyzes historical CPU and memory usage and generates resource suggestionsRuns continuously
UpdaterDecides when to apply new recommendations by evicting podsPod lifecycle events or thresholds exceeded
Admission ControllerInjects recommended resources at pod creation timeEvery new pod start

Here’s how it works in practice:

  1. Recommender collects metrics from pods and generates target CPU/memory values.
  2. Updater decides whether to evict a pod to apply the recommendation (based on policies).
  3. Admission Controller mutates pod specs on startup to apply recommendations automatically.

⚠️ Note: If the updateMode is set to "Auto", VPA may evict and restart pods to apply new values. This can cause downtime if not planned for.

This architecture allows VPA to gradually adapt pod sizing over time, but it also means updates aren’t instant—and pod restarts can affect service stability.

When to Use VPA#

VPA is best suited for workloads where scaling out (adding replicas) isn’t effective—or where tuning resource requests manually is inefficient. It helps teams rightsize CPU and memory for individual pods, especially in systems where consistent performance depends on how much is allocated per instance.

Ideal Use Cases#

  • Memory-bound applications like Java, Spark, or ML workloads
  • Batch jobs that vary in resource usage over time
  • Internal APIs or services where you prefer fewer, well-sized pods
  • Stateful applications that can’t scale horizontally easily
  • Development and test clusters where developers often guess resource requests

VPA vs HPA vs Cluster Autoscaler#

CapabilityVPAHPACluster Autoscaler (CA)
What it scalesPod CPU/memory requests/limitsPod replica countNumber of cluster nodes
Acts on live pods?Only with eviction or restartYesNo (infra-level only)
Scaling triggerHistorical resource usageCurrent CPU/memory or custom metricsPending pods / idle nodes
Can it downscale?Yes (with restarts)YesYes
Works with stateful apps?Yes⚠️ Limited✅ Yes
Main benefitRightsizes containersScales out with loadOptimizes node-level capacity

If your workload is CPU-light but memory-heavy—or if you’re constantly adjusting requests to avoid throttling or OOM kills—VPA may be a better fit than HPA. Just keep in mind that updates often require a pod restart, so it’s best used when downtime is acceptable or easily mitigated.

Limitations and Trade-offs#

While the Vertical Pod Autoscaler (VPA) solves important problems—like reducing over-provisioning and automating resource tuning—it also comes with trade-offs that make it unsuitable for certain workloads or setups.

Pod Restarts Are Required for Updates#

VPA cannot resize a running pod. To apply new resource requests, it needs to evict and restart the pod. This creates a few challenges:

  • Stateful or long-lived apps may experience downtime
  • Pods using emptyDir or non-persistent storage lose data on restart
  • If the app isn’t restart-friendly, updates can introduce risk

This is why many teams run VPA in updateMode: "Off" to collect recommendations first, then apply them manually.

Conflicts with HPA#

VPA and HPA don’t work well together if both are configured to manage the same resource, like CPU or memory. Kubernetes doesn’t resolve conflicts—it just creates unpredictable behavior.

Safe patterns include:

  • HPA for replicas, VPA for memory only
  • Or using VPA in recommendation mode alongside HPA for scaling

Limited Signal Awareness#

VPA only uses historical CPU and memory usage to generate recommendations. It does not consider:

  • Request rate
  • Latency
  • I/O
  • Business-level metrics

This makes it less effective for:

  • Highly dynamic workloads
  • Latency-sensitive systems where usage ≠ demand

Metrics Need Time to Stabilize#

VPA relies on aggregated metrics. Short-lived or bursty pods may not generate enough consistent data for meaningful recommendations.

Summary#

VPA is a powerful tool for container rightsizing—but it’s not a drop-in solution. It’s best deployed with awareness of pod lifecycle, scaling strategy, and observability needs.

Installing and Configuring VPA#

VPA is not included in Kubernetes by default—you’ll need to deploy it as a set of components maintained by the SIG Autoscaling group. Setup is straightforward, but configuration choices (especially update modes) will affect how safely VPA operates.

Step 1: Install VPA Components#

You can deploy the official Vertical Pod Autoscaler using the manifests from theVPA GitHub repo:

kubectl apply -f https://raw.githubusercontent.com/kubernetes/autoscaler/master/vertical-pod-autoscaler/deploy/vpa-upstream.yaml

This installs the three core components:

  • vpa-recommender
  • vpa-updater
  • vpa-admission-controller

Make sure your cluster has:

  • Metrics Server installed and working
  • RBAC enabled
  • Webhooks enabled (for the Admission Controller)

Step 2: Create a Deployment#

Here’s a basic deployment using static resource requests:

Step 3: Attach a VPA Resource#

Now define the VerticalPodAutoscaler object to manage the resource requests.

Update Modes Explained#

ModeWhat it doesWhen to use it
OffOnly generates recommendations (no action taken)Safest for testing and monitoring
AutoAutomatically evicts pods to apply new valuesUse in non-critical workloads or dev envs
RecreateApplies changes on next pod restart (manual trigger)Useful if you want to control timing

🛑 Be cautious with "Auto" mode in production—it may restart pods at inconvenient times.

Once installed, VPA runs continuously in the background and updates recommendations as it observes usage patterns.

Best Practices for Using VPA in Production#

Running VPA in production requires careful planning. It’s not just about enabling autoscaling—it’s about controlling when and how resource updates happen, and avoiding disruptions in critical workloads.

1. Start in Observation Mode (updateMode: "Off")#

Begin with VPA in passive mode to collect recommendations without applying changes. This lets you:

  • Validate whether your resource requests are misaligned
  • Understand usage patterns before taking action
  • Avoid surprises in production

Use this data to rightsize manually, or switch to automated modes once you’re confident.

2. Choose the Right Update Mode#

Here’s a quick breakdown of when to use each mode:

ModeApplies Changes Automatically?Causes Pod Restarts?Recommended For
OffNoNoBaseline visibility, production clusters
AutoYesYesNon-critical workloads, dev/test environments
RecreateNo (applied on next restart)ControlledStateful apps, canary or rolling deploys

3. Avoid Conflicts with HPA#

If you’re using both VPA and HPA:

  • Do not target the same resource (e.g. CPU)
  • Safe pattern: HPA scales replicas based on CPU; VPA adjusts memory requests only
  • Alternatively, use VPA in Off mode for recommendations while HPA handles live scaling

4. Combine VPA with Cluster Autoscaler#

VPA can reduce resource requests, which allows the Cluster Autoscaler to pack more pods onto fewer nodes. This improves binpacking and can reduce cloud spend—especially in clusters with mixed workloads.

But remember: if VPA suddenly increases resource requests, pods may go unschedulable unless the Cluster Autoscaler is fast enough to provision space.

5. Monitor Impact with Cost and Usage Metrics#

Adjusting resource requests affects:

  • Node binpacking efficiency
  • Pod priority and scheduling
  • Overall cluster cost

It’s important to track how VPA decisions translate to infrastructure behavior. This is where aKubernetes cost monitoring tool helps—by connecting usage changes to real spend.

Advanced VPA Use Cases#

VPA is often treated as a basic resource tuning tool—but it can also support more advanced scenarios, especially when combined with observability and deployment automation.

1. Memory-Bound or ML Workloads#

Machine learning jobs and JVM-based services (like Spark, Java, or Scala) often don’t scale well horizontally. They need:

  • High memory per pod
  • Stable performance across execution cycles

VPA helps here by gradually learning resource profiles over time. It allows teams to:

  • Avoid manual tuning per job run
  • Reduce OOM kills and inefficient over-provisioning
  • Adapt to seasonal or dataset-based memory usage shifts

2. Batch Jobs and CronJobs#

Short-lived jobs often have unpredictable spikes in resource use. VPA can:

  • Recommend requests based on past executions
  • Allow tighter binpacking across job waves
  • Work well with updateMode: "Recreate" for predictable deployment cycles

If you’re running time-sensitive ETL, data prep, or distributed compute jobs, VPA helps avoid both under- and over-resourcing.

3. Scheduled Resource Resetting#

Some teams use VPA to reset resource requests during off-peak hours:

  • Run VPA in Auto mode during maintenance windows
  • Let it update requests and evict pods without user impact
  • Switch back to Off mode during peak hours

This hybrid approach blends automation with operational control—especially useful for clusters with strict uptime or compliance requirements.

4. VPA + Observability Tools#

VPA only acts on CPU and memory metrics. But when paired with observability platforms, you can:

  • Validate VPA behavior against latency and SLOs
  • Flag over-aggressive recommendations
  • Feed insights into custom dashboards or cost analysis tools

If you use something like Prometheus + Grafana or aKubernetes cost optimization tool, this unlocks much deeper tuning.

Operational Constraints of VPA#

The Vertical Pod Autoscaler helps solve a common Kubernetes problem: poorly sized workloads. It analyzes historical CPU and memory usage and recommends better resource requests—reducing over-provisioning and manual tuning.

But while useful, VPA has real limitations when used in production. It isn’t designed for real-time responsiveness, introduces disruption when applying changes, and lacks broader context like cost or scheduling efficiency. These constraints make it helpful for offline recommendations—but difficult to rely on for live, automated optimization.

DevZero extends the same intent behind VPA, but addresses these operational gaps.

  • Workload Rightsizing: VPA suggests better resource values but requires restarts to apply them. DevZero adjusts CPU and memory requests on running pods, in real time, without evictions. This eliminates the downtime and complexity associated with production resizing.
  • Live Migration: One of the biggest risks of VPA is that it triggers restarts. DevZero safely migrates workloads across nodes by pausing and resuming execution—avoiding cold starts and service disruption during optimization cycles.
  • Binpacking: VPA reduces pod size, which helps indirectly with binpacking—but DevZero goes further. It actively redistributes workloads across nodes based on updated resource profiles, improving density and reducing the number of active nodes needed.
  • Visibility into cost impact: VPA has no awareness of infrastructure cost. DevZero ties resource decisions to actual spend, so platform teams can see how changes affect node utilization and cloud cost—closing the loop between tuning and business impact.

In short, VPA shows you what to fix—DevZero makes it actionable, safe, and continuous. It’s the next step for teams that want the benefits of autoscaling without the operational trade-offs.Learn more →

Share:
Alberto Grande

Alberto Grande

Head of Marketing

Cut Kubernetes Cost

Before You Pay a Cent.

Every feature unlocked. No hidden fees.

Start for free

Start Free

$0/ month
Kubernetes resource and cost monitoring
Up to 2 active clusters
Platform access for 45 days
Cost attribution for departments
Data export for chargeback
Audit logging