Blog

8 practical controls for real Kubernetes cost optimization (beyond rightsizing theater)

Learn eight practical controls for Kubernetes cost optimization, from bin packing and autoscaling to spot instances, chargeback, and multi-tenancy, with realistic figures, caveats, and a 90-day FinOps roadmap.

1. From rightsizing theater to real Kubernetes cost optimization

Most organisations say they care about Kubernetes cost optimization, yet their clusters still burn money. The FinOps Foundation reports that a large share of total Kubernetes cloud cost is tied to idle capacity in the cluster, which means the problem is architectural rather than just about tweaking a single pod. In the FinOps for Kubernetes report (2023), they highlight that underutilised nodes and low bin packing efficiency are the dominant drivers of waste, not a few oversized deployments. Their survey data suggests that more than half of spend in many environments is associated with unused or underused capacity, although exact percentages vary by provider and workload mix. If your teams only talk about rightsizing workloads and never about bin packing efficiency or multi-tenant density, you are optimising the wrong resource and leaving savings on the table.

The first mental shift is to treat Kubernetes cost as a product surface, with explicit cost management requirements and non-functional performance budgets. That means your platform teams own a roadmap for cost optimization, just as they own reliability and security, and they use real-time data from tools such as Datadog or Kubecost to steer decisions about nodes and clusters. When you frame Kubernetes costs as a question of unit economics for each workload rather than a generic cloud spend line item, you finally get cost visibility that engineering leaders can act on and compare across products.

Rightsizing a single pod without changing the cluster footprint is theatre because the control plane and nodes still run mostly idle. You can cut CPU and memory requests for a deployment by 30 %, but if the cluster autoscaler never scales down nodes, your cloud providers still bill you for the same number of instances and storage volumes. A simple example: if a team reduces requests on a service from 4 vCPU to 2 vCPU but still runs on a three-node pool of 4 vCPU instances, the monthly bill barely moves. Real savings only appear when cost allocation is tied to concrete actions on clusters, such as consolidating workloads, shrinking node groups, and enforcing requests/limits hygiene across namespaces so that bin packing decisions actually remove capacity.

2. Control one: bin packing efficiency beats heroic rightsizing

The highest ROI control in Kubernetes cost optimization is ruthless bin packing of workloads onto fewer nodes. When you treat each cluster as a bin packing problem, you design requests and limits, pod autoscaler settings, and node sizes so that CPU, memory, and storage resources are tightly packed instead of scattered. This is where the FinOps mindset meets scheduling theory, because cost optimization becomes a question of how efficiently you can map pods to nodes while preserving performance and respecting SLOs.

Bin packing efficiency matters more than perfect rightsizing of every pod, because cloud cost is dominated by the number and size of nodes in your clusters. If you run three half-empty clusters instead of one well-packed multi-tenant cluster, your Kubernetes cost will stay high even if each individual workload looks lean in isolation. The FinOps Foundation FinOps for Kubernetes report (2023) notes that teams with strong bin packing practices and active node right sizing often achieve savings in the 20–40 % range compared with peers who only focus on workload-level tuning, although exact percentages vary by environment and the figures are based on self-reported survey responses rather than controlled benchmarks.

For senior architects, this is where FinOps becomes an architecture decision rather than a monthly report. When you evaluate FinOps as an architecture decision, you are really deciding how aggressively you will consolidate workloads, share clusters, and accept multi-tenancy trade-offs. The control plane, cluster autoscaler, and pod autoscaler must be configured together so that bin packing decisions translate into fewer nodes, lower cloud spend, and measurable cost savings for each équipe, instead of just prettier utilisation graphs. A common failure mode is improving pod density but pinning critical workloads to dedicated node pools “just in case”, which quietly recreates the same waste under a different name.

3. Controls two and three: requests limits hygiene and HPA tuning

Once bin packing is on the table, the next control is requests/limits hygiene across all workloads. The most common anti-pattern in real clusters is setting requests equal to limits with no horizontal pod autoscaler configured, which blocks bin packing and keeps nodes artificially full on paper while they are underused in practice. Datadog’s Container Report (2023) shows that misconfigured requests and limits are among the top drivers of unnecessary Kubernetes costs in production clusters. When teams fix this by separating request and limit values and by enabling a pod autoscaler for variable workloads, they unlock both performance stability and cost optimization.

HPA tuning is the third control, and it is where many teams quietly sabotage Kubernetes cost optimization. If your pod autoscaler scales on CPU or memory alone, without any business-based metrics such as queue depth or request latency, you will either over-scale and inflate cloud cost or under-scale and hurt performance. The Datadog Container Report (2023) highlights that misconfigured HPAs and aggressive scale-out thresholds are a major driver of Kubernetes costs, because they trigger unnecessary node scale-ups that the cluster autoscaler cannot later reverse without risking disruption.

Vertical Pod Autoscaler looks attractive for automatic resource management, but it is often the wrong answer for latency-sensitive workloads. VPA can restart pods at awkward times, and it fights with bin packing when it inflates resource requests based on short-term spikes in data. Before you invest in VPA, make sure your teams have mastered HPA tuning, namespace-level policies, and platform-as-a-product practices, otherwise you risk building what one KubeCon speaker called the internal developer platform nobody asked for, with opaque behaviour and rising Kubernetes costs. A pragmatic pattern is to start with HPA plus conservative requests, then introduce VPA in “recommendation” mode for a few back-end jobs before allowing it to apply changes automatically.

4. Controls four and five: smarter autoscaling and spot instances strategy

The fourth control is the configuration of your cluster autoscaler or its modern replacements such as Karpenter on AWS. With a substantial share of AWS Kubernetes clusters already using Karpenter, according to AWS public statements in 2023, the industry is clearly betting on more intelligent node management that reacts in real time to pod scheduling events. Public talks and blog posts from AWS suggest that roughly a third of EKS clusters had adopted Karpenter by late 2023, but these figures are directional rather than a full census. When you align Karpenter or a traditional cluster autoscaler with your bin packing and HPA strategies, you finally connect workload-level decisions to actual reductions in nodes and cluster costs.

The fifth control is your strategy for spot instances and other preemptible capacity across cloud providers. Public benchmarks from Spot.io and CAST AI, published between 2022 and 2023, show that teams can reach around 40 % spend reduction when they move stateless workloads to spot instances while keeping stateful storage on more stable nodes. These benchmarks are usually based on vendor-run experiments or customer case studies, so treat the numbers as indicative upper bounds rather than guaranteed outcomes. The Datadog Container Report (2023) also notes that the share of spot and preemptible workloads is expected to grow from about 18 % of capacity to nearly half of all nodes in some environments, which means spot-based architectures are becoming the default rather than an exotic optimisation.

To make this work, you need clear workload classifications and cost management policies that define which services can tolerate interruption. Stateless APIs, batch jobs, and asynchronous data processing pipelines are ideal candidates for spot instances, while databases and critical control plane components stay on on-demand nodes. A simple cost sketch: if 50 % of your node hours move from on-demand to spot with a 60 % discount, your theoretical saving is about 30 % on compute, before accounting for extra engineering effort and occasional fallbacks to on-demand. The main failure modes are silent data loss when stateful workloads accidentally land on spot nodes, and cascading outages when interruption handling is not tested. When teams combine smart autoscaling, spot-based capacity, and strong cost allocation tagging, they gain cost visibility down to each workload and can explain every euro of cloud spend to finance without hand-waving.

5. Controls six and seven: idle namespace policies and namespace chargeback

The sixth control is brutally simple, yet rarely enforced in real clusters. Idle namespace policies define how long a namespace, environment, or workload can sit unused before the platform équipe reclaims its resources and storage, which directly reduces Kubernetes costs. Without these policies, development and test clusters quietly accumulate zombie pods, orphaned volumes, and forgotten nodes that inflate cloud cost without delivering any business value, especially in long-running staging environments.

Namespace chargeback is the seventh control, and it turns cost visibility into behavioural change. When you allocate Kubernetes cost back to product teams based on real-time usage of CPU, memory, storage, and network resources, you create a feedback loop that encourages better cost management decisions. Tools such as Kubecost, CAST AI, and native cloud providers’ billing APIs make this cost allocation feasible, but the real work is aligning it with your internal unit economics and incentives so that teams feel accountable rather than punished.

Chargeback only works if the data is trusted and the rules are clear. Teams must see a transparent mapping from their workloads and pods to the underlying nodes, clusters, and cloud spend, otherwise they will treat cost reports as noise. This is where a platform product manager can use insights from articles such as how SyncGrades are shaping the future of software development to define shared KPIs that balance performance, reliability, and cost optimization across the portfolio, instead of chasing a single vanity metric like raw cloud spend. A common anti-pattern is “shadow chargeback”, where finance publishes cost numbers that engineers cannot reconcile with their own dashboards, leading to endless disputes instead of optimisation.

6. Control eight and the FinOps maturity ladder for Kubernetes

The eighth control is multi-tenancy density, which is the deliberate choice to run more workloads per cluster while managing blast radius through policy rather than physical isolation. Many organisations default to one cluster per team or per environment, which multiplies control plane costs, wastes nodes, and fragments storage resources across dozens of underused clusters. When you consolidate into fewer, larger clusters with strong network policies, RBAC, and resource quotas, you unlock both savings and simpler management, while still meeting compliance requirements.

A practical way to operationalise these eight controls is to adopt a Kubernetes-specific FinOps maturity ladder. At the crawl stage, you focus on basic cost visibility, tagging, and a single source of truth for Kubernetes cost and cloud spend data, often using Kubecost or Datadog dashboards. At the walk stage, you implement bin packing best practices, requests/limits hygiene, and initial cluster autoscaler tuning, while at the run stage you embrace multi-cloud strategies, spot instances at scale, and namespace chargeback aligned with unit economics and product-level profitability.

By month three of a serious Kubernetes FinOps programme, you should have dashboards that show cost per namespace, cost per workload, and cost per business transaction in near real time. These views must connect performance metrics, such as latency and error rates, with resource usage and costs, so that teams can see the trade-offs of every optimisation. In the end, clusters cost what the team allows them to cost, not the keynote demo, but the third quarter in production when budgets tighten and every optimisation is scrutinised. The main risk at this stage is declaring victory after the first round of savings and letting entropy creep back in as new services launch without the same discipline.

Key figures that shape Kubernetes cost optimization

Survey data in the FinOps Foundation FinOps for Kubernetes report (2023) indicates that a majority of average Kubernetes cluster spend is tied to idle or underutilised capacity, which highlights the impact of poor bin packing and oversized nodes. The often-quoted figures around 65 % idle cost are based on aggregated self-reported data rather than direct billing exports, so treat them as directional.
The share of workloads running on spot or preemptible instances is expected to grow from roughly 18 % of capacity to around 40–50 % in many organisations, based on public data and benchmarks from Spot.io and CAST AI (2022–2023), showing a structural shift toward interruption-tolerant architectures. These numbers typically come from vendor customer bases and may not represent the entire market.
AWS public statements in 2023 suggested that on the order of a third of EKS clusters were already using Karpenter for node provisioning, indicating rapid adoption of more dynamic cluster autoscaler alternatives. Exact adoption rates depend on how clusters are counted and which regions or account types are included.
Public benchmarks from Spot.io and CAST AI (2022–2023) report up to about 40 % Kubernetes cloud cost reduction when organisations combine spot instances, aggressive bin packing, and automated node right sizing, though actual savings depend on workload mix, operational maturity, and the willingness to accept interruption risk.
Datadog’s Container Report (2023) shows that misconfigured requests and limits, especially when requests equal limits with no HPA, are among the top drivers of unnecessary Kubernetes costs in production clusters, based on anonymised telemetry from their customer base.

FAQ about Kubernetes cost optimization

How do I start a Kubernetes FinOps programme without slowing delivery ?

Begin by creating a single, trusted view of Kubernetes cost and cloud spend per namespace and workload. Use that data to identify the no-regret actions, such as fixing extreme requests/limits and consolidating obviously idle clusters, before you touch application code. Once teams see real-time cost visibility, you can introduce bin packing policies and autoscaling best practices as part of normal delivery work, not as a separate cost-cutting project.

What is the fastest way to reduce Kubernetes costs by double digits ?

The quickest path is usually a combination of bin packing improvements and a structured move to spot instances for stateless workloads. Start by tightening requests and limits and enabling the cluster autoscaler to scale nodes down when utilisation stays low, then migrate non-critical services to spot-based node groups. Many organisations report 20 to 40 % savings within a few months using this approach, without major application rewrites, as reflected in public case studies from Spot.io and CAST AI (2022–2023). The main caveat is that savings depend on regional spot availability and on how well your workloads tolerate interruptions.

When should I use Vertical Pod Autoscaler instead of Horizontal Pod Autoscaler ?

Vertical Pod Autoscaler is useful for stable, long-running workloads where you want the platform to adjust resource requests over time without changing replica counts. Horizontal Pod Autoscaler is better for spiky, user-facing services where scaling out replicas preserves performance under load. In many environments, HPA plus good requests/limits hygiene delivers most of the cost optimization benefits, while VPA is reserved for specific back-end services with predictable traffic patterns.

How do I handle multi tenancy without compromising security and compliance ?

Secure multi-tenancy depends on strong isolation controls rather than separate clusters for every team. Use network policies, strict RBAC, resource quotas, and dedicated namespaces to isolate workloads while still benefiting from shared nodes and a single control plane. This approach increases density and reduces costs, but it requires disciplined platform management, regular audits, and clear runbooks for incident response in shared clusters. Typical failure modes include overly permissive cluster-wide roles and missing network policies that allow unintended lateral movement between tenants.

What dashboards are essential for Kubernetes cost management ?

By the third month, you should have dashboards that show cost per namespace, cost per workload, and cost per business transaction, alongside utilisation metrics for CPU, memory, storage, and network. These views must be updated in near real time and linked to both performance and reliability indicators, so teams can see the impact of their changes. Combining these dashboards with clear chargeback rules turns raw data into actionable cost management behaviour and makes Kubernetes FinOps part of everyday engineering decisions.

Appendix: practical controls, configs, and a 90-day Kubernetes FinOps roadmap

Sample HPA configuration (CPU-based)

apiVersion: autoscaling/v2
kind: HorizontalPodAutoscaler
metadata:
  name: api-hpa
spec:
  scaleTargetRef:
    apiVersion: apps/v1
    kind: Deployment
    name: api-deployment
  minReplicas: 2
  maxReplicas: 10
  metrics:type: Resource
    resource:
      name: cpu
      target:
        type: Utilization
        averageUtilization: 65

Sample VPA configuration (conservative, for a batch job)

apiVersion: autoscaling.k8s.io/v1
kind: VerticalPodAutoscaler
metadata:
  name: batch-vpa
spec:
  targetRef:
    apiVersion: apps/v1
    kind: Deployment
    name: batch-worker
  updatePolicy:
    updateMode: Auto
  resourcePolicy:
    containerPolicies:containerName: "*"
      controlledResources: ["cpu", "memory"]
      minAllowed:
        cpu: "200m"
        memory: "256Mi"
      maxAllowed:
        cpu: "2"
        memory: "4Gi"

Example node size choices for better bin packing

Prefer a small set of standardised node types (for example, 2–4 vCPU and 8–16 GiB RAM) instead of many exotic sizes.
Align pod requests so that common workloads fit cleanly (for example, three 500 m CPU / 1 GiB pods on a 2 vCPU / 4 GiB node).
Use taints and tolerations sparingly so that nodes remain broadly usable across namespaces.

Bin-packing checklist

Audit all namespaces for pods with requests equal to limits and no HPA.
Standardise a small set of node groups and deprecate rarely used instance types.
Enable cluster autoscaler or Karpenter with scale-down enabled and realistic cooldowns.
Set namespace-level resource quotas to prevent a single team from blocking bin packing.
Review DaemonSets and system pods that consume capacity on every node.

90-day Kubernetes FinOps roadmap (crawl, walk, run)

Days 1–30 (crawl): establish a single cost dashboard per cluster, tag namespaces and workloads, identify idle clusters and zombie resources, and agree on basic SLOs for cost per transaction.
Days 31–60 (walk): fix extreme requests/limits, roll out HPAs for spiky services, standardise node sizes, and tune cluster autoscaler or Karpenter to allow aggressive scale-down.
Days 61–90 (run): migrate stateless workloads to spot instances, introduce idle namespace policies, pilot namespace chargeback, and consolidate low-risk clusters into shared multi-tenant environments.