K8s Cloud Costs Exploding? 2026 FinOps Practices: 5 Strategies to Cut Kubernetes Costs by 60%
Checking the cloud bill at month-end and K8s costs exceeded budget by 30% again? CPU utilization at 15% but paying for 100%? Spot instance reclamation causing service outages? These scenarios are all too common in 2026 cloud-native teams. FinOps isn't about saving money—it's about making every cent of cloud spend produce business value. This article uses 5 practical strategies to help you cut K8s costs by 60%.
Background: FinOps Framework
FinOps (Financial Operations) is the practice of bringing financial accountability to cloud consumption:
| Phase |
Goal |
Key Actions |
Owner |
| Inform |
Cost visibility |
Tag governance, cost allocation, billing analysis |
FinOps team |
| Optimize |
Reduce waste |
Right-sizing, Spot instances, Reserved instances |
Engineering team |
| Operate |
Continuous control |
Budget alerts, autoscaling, policy enforcement |
Platform team |
Problem Analysis: Where Does K8s Cost Waste Come From?
| Waste Source |
Percentage |
Typical Scenario |
| Resource over-provisioning (request >> actual) |
40% |
CPU request 2 cores, actual usage 0.3 cores |
| Idle resources (running with no traffic) |
25% |
Dev environment running 24h, no traffic after hours |
| Not using Spot/Preemptible instances |
20% |
All Pods use On-Demand instances |
| Missing autoscaling |
10% |
HPA not configured, resources not released during low traffic |
| Storage waste |
5% |
Oversized PVCs, uncleaned logs and images |
Strategy 1: Resource Right-Sizing
Step 1: Install metrics-server and kube-resource-report
kubectl apply -f https://github.com/kubernetes-sigs/metrics-server/releases/latest/download/components.yaml
helm install kube-resource-report helm/kube-resource-report \
--set prometheus.url=http://prometheus:9090
Step 2: Analyze Resource Utilization
kubectl top pods -A --sort-by=cpu
kubectl resource-capacity --sort cpu.request --pods
Step 3: Automated Right-Sizing Recommendations
apiVersion: apps.kubecost.com/v1beta1
kind: RightSizingRecommendation
metadata:
name: all-deployments
spec:
targetRef:
kind: Deployment
name: order-service
current:
requests:
cpu: "2"
memory: "4Gi"
limits:
cpu: "4"
memory: "8Gi"
recommended:
requests:
cpu: "250m"
memory: "512Mi"
limits:
cpu: "500m"
memory: "1Gi"
savingsPercent: 85
Step 4: P99-Based Right-Sizing Script
# right_sizing.py
import subprocess
import json
from datetime import datetime, timedelta
def get_pod_metrics(namespace: str, days: int = 7) -> dict:
end_time = datetime.now()
start_time = end_time - timedelta(days=days)
result = subprocess.run([
'kubectl', 'get', '--raw',
f'/apis/metrics.k8s.io/v1beta1/namespaces/{namespace}/pods'
], capture_output=True, text=True)
pods = json.loads(result.stdout)
metrics = {}
for item in pods.get('items', []):
pod_name = item['metadata']['name']
containers = item['containers']
total_cpu = 0
total_mem = 0
for c in containers:
usage = c.get('usage', {})
cpu_str = usage.get('cpu', '0m')
mem_str = usage.get('memory', '0Ki')
total_cpu += parse_cpu(cpu_str)
total_mem += parse_memory(mem_str)
metrics[pod_name] = {'cpu_millicores': total_cpu, 'memory_mib': total_mem}
return metrics
def parse_cpu(s: str) -> int:
if s.endswith('m'):
return int(s[:-1])
return int(float(s) * 1000)
def parse_memory(s: str) -> int:
if s.endswith('Ki'):
return int(s[:-2]) // 1024
if s.endswith('Mi'):
return int(s[:-2])
if s.endswith('Gi'):
return int(s[:-2]) * 1024
return int(s) // (1024 * 1024)
def generate_recommendations(metrics: dict, buffer_percent: int = 20) -> list:
recommendations = []
for pod, usage in metrics.items():
cpu_rec = int(usage['cpu_millicores'] * (1 + buffer_percent / 100))
mem_rec = int(usage['memory_mib'] * (1 + buffer_percent / 100))
recommendations.append({
'pod': pod,
'recommended_cpu': f'{cpu_rec}m',
'recommended_memory': f'{mem_rec}Mi',
'current_cpu': usage['cpu_millicores'],
'current_memory': usage['memory_mib'],
})
return recommendations
if __name__ == '__main__':
metrics = get_pod_metrics('production')
recs = generate_recommendations(metrics, buffer_percent=20)
for r in recs:
print(f"{r['pod']}: CPU {r['recommended_cpu']}, Memory {r['recommended_memory']}")
Strategy 2: Spot/Preemptible Instances
apiVersion: apps/v1
kind: Deployment
metadata:
name: batch-processor
spec:
replicas: 5
selector:
matchLabels:
app: batch-processor
template:
metadata:
labels:
app: batch-processor
spec:
nodeSelector:
node-type: spot
tolerations:
- key: "cloud.google.com/gke-provisioning"
operator: "Equal"
value: "spot"
effect: "NoSchedule"
containers:
- name: processor
image: myapp/batch-processor:latest
resources:
requests:
cpu: "500m"
memory: "1Gi"
terminationGracePeriodSeconds: 60
Spot Instance Interruption Handling
apiVersion: apps/v1
kind: Deployment
metadata:
name: spot-aware-service
spec:
replicas: 3
template:
spec:
containers:
- name: app
image: myapp:latest
lifecycle:
preStop:
exec:
command: ["/bin/sh", "-c", "sleep 15"]
topologySpreadConstraints:
- maxSkew: 1
topologyKey: topology.kubernetes.io/zone
whenUnsatisfiable: DoNotSchedule
labelSelector:
matchLabels:
app: spot-aware-service
Strategy 3: Cluster Autoscaling
apiVersion: autoscaling/v2
kind: HorizontalPodAutoscaler
metadata:
name: api-gateway-hpa
spec:
scaleTargetRef:
apiVersion: apps/v1
kind: Deployment
name: api-gateway
minReplicas: 3
maxReplicas: 20
metrics:
- type: Resource
resource:
name: cpu
target:
type: Utilization
averageUtilization: 60
behavior:
scaleDown:
stabilizationWindowSeconds: 300
policies:
- type: Percent
value: 10
periodSeconds: 60
scaleUp:
stabilizationWindowSeconds: 60
policies:
- type: Percent
value: 100
periodSeconds: 30
---
apiVersion: autoscaling.k8s.io/v1beta2
kind: VerticalPodAutoscaler
metadata:
name: api-gateway-vpa
spec:
targetRef:
apiVersion: apps/v1
kind: Deployment
name: api-gateway
updatePolicy:
updateMode: Auto
resourcePolicy:
containerPolicies:
- containerName: '*'
minAllowed:
cpu: 100m
memory: 128Mi
maxAllowed:
cpu: "2"
memory: 4Gi
Strategy 4: Cost Monitoring and Alerts
apiVersion: v1
kind: ConfigMap
metadata:
name: cost-alerts
data:
alerts.json: |
[
{
"name": "daily-budget-alert",
"type": "budget",
"threshold": 500,
"window": "1d",
"aggregation": "cluster",
"notification": {
"type": "slack",
"channel": "#finops-alerts"
}
},
{
"name": "namespace-spike-alert",
"type": "spendChange",
"threshold": 0.3,
"window": "7d",
"baselineWindow": "7d",
"aggregation": "namespace",
"notification": {
"type": "email",
"email": "finops-team@company.com"
}
}
]
Strategy 5: Dev Environment Scheduled Scaling
apiVersion: zalando.org/v1
kind: ScheduleSwitch
metadata:
name: dev-environment-schedule
namespace: dev
spec:
switches:
- startTime: "0 8 * * 1-5"
endTime: "0 20 * * 1-5"
replicas: 1
description: "Weekdays 8:00-20:00"
- startTime: "0 20 * * 1-5"
endTime: "0 8 * * 1-5"
replicas: 0
description: "Off-hours scale to 0"
- startTime: "0 0 * * 0,6"
endTime: "0 0 * * 1"
replicas: 0
description: "Weekends scale to 0"
Pitfall Guide
| # |
Pitfall |
Symptom |
Solution |
| 1 |
VPA Auto mode causes frequent Pod restarts |
Service availability drops |
Use Off mode first to observe recommendations, then switch to Auto |
| 2 |
Spot reclamation causes batch Pod interruption |
Widespread 503 errors |
Use topologySpreadConstraints for cross-AZ distribution, set preStop graceful shutdown |
| 3 |
HPA and VPA conflict |
Replica count and resource size oscillate |
HPA manages replicas, VPA manages resources, don't use both on same metric |
| 4 |
Cluster Autoscaler shrinks critical nodes |
Stateful Pods evicted |
Add cluster-autoscaler.kubernetes.io/safe-to-evict: "false" annotation |
| 5 |
Inconsistent cost allocation tags |
Cannot attribute costs by team/project |
Establish Label Policy, enforce with Kyverno/OPA |
Error Troubleshooting
| Error Message |
Cause |
Solution |
metrics-server: no metrics available |
metrics-server not installed or not ready |
Install metrics-server, check kubectl top nodes |
VPA recommender: OOMKilled |
VPA recommender out of memory |
Increase VPA recommender memory request |
HPA: unable to get metric |
Custom metric not registered |
Check Prometheus Adapter and custom metrics API |
ClusterAutoscaler: node group not found |
Node group misconfigured |
Check --nodes parameter format: min:max:node-group-name |
Spot node: preempted |
Spot instance reclaimed by cloud provider |
Normal behavior, ensure sufficient PodDisruptionBudget |
PodDisruptionBudget: not enough replicas |
PDB too strict, blocking scale-down |
Adjust PDB minAvailable or maxUnavailable |
Kubecost: pricing data not available |
Cloud pricing API unreachable |
Configure custom pricing or use defaultCPUPrice/defaultRAMPrice |
Scale to 0: jobs still running |
Unfinished Jobs blocking scale-down |
Wait for Jobs to complete or set activeDeadlineSeconds |
ResourceQuota: exceeded quota |
Right-sizing exceeds tenant quota |
Adjust ResourceQuota or coordinate with platform team |
Node: NotReady after scale-up |
New node initialization failed |
Check node startup scripts and init containers |
Advanced Optimization
1. Reserved Instance / Savings Plans
| Option |
Discount |
Flexibility |
Use Case |
| On-Demand |
0% |
Highest |
Temporary/burst workloads |
| Spot/Preemptible |
60-90% |
Low (reclaimable) |
Stateless/interruptible |
| 1-Year Reserved |
30-40% |
Medium |
Stable baseline load |
| 3-Year Reserved |
50-60% |
Low |
Long-term core services |
| Savings Plans |
20-40% |
High (cross-instance) |
Mixed workloads |
2. Multi-Cluster Cost Optimization
apiVersion: kubecost.com/v1
kind: MultiClusterCost
spec:
clusters:
- name: us-east-prod
apiEndpoint: https://k8s-us-east.example.com
costWeight: 1.0
- name: eu-west-prod
apiEndpoint: https://k8s-eu-west.example.com
costWeight: 0.8
Comparison Analysis
| FinOps Tool |
Cost Visibility |
Right-Sizing |
Spot Support |
Open Source |
Pricing |
| Kubecost |
★★★★★ |
★★★★★ |
★★★★ |
Yes |
Free/Enterprise$ |
| CloudHealth |
★★★★★ |
★★★★ |
★★★ |
No |
% of cloud spend |
| AWS Cost Explorer |
★★★★ |
★★★ |
★★★ |
No |
Free |
| Prometheus + Grafana |
★★★ |
★★ |
★ |
Yes |
Free |
| Vantage |
★★★★ |
★★★★ |
★★★ |
No |
Per seat |
Summary: K8s cost optimization isn't a one-size-fits-all approach—it's a systematic FinOps three-phase process. Inform phase makes costs transparent, Optimize phase uses right-sizing + Spot + autoscaling to eliminate waste, Operate phase uses alerts and policies for continuous control. Combining 5 strategies: right-sizing saves 40%, Spot instances save 20%, autoscaling saves 15%, dev scheduled scaling saves 10%, cost monitoring prevents 5% regression—that's 60% total. In 2026, K8s ops without FinOps is like shopping without checking the bill.