K8s HPA自動擴縮容:從指標配置到生產穩定的7個關鍵調優策略

DevOps

你的K8s叢集在流量高峰時像個紙糊的

凌晨3點流量突增,Pod被OOM Kill;大促期間HPA瘋狂擴容,資料庫連線池瞬間打滿;縮容太激進,剛擴出來的Pod還沒熱身完就被殺掉。Kubernetes HPA(Horizontal Pod Autoscaler)不是配個CPU閾值就能上生產的——預設的指標視窗、擴縮策略、冷卻時間,全都是為demo設計的,直接上生產就是災難。

本文將從HPA基礎配置出發,帶你完成指標配置→自訂指標→擴縮行為調優→生產穩定的7個關鍵調優策略,從開發環境到生產部署,一步不落。


HPA核心概念

概念 說明
Horizontal Pod Autoscaler 水平Pod自動擴縮器,根據指標自動調整Pod副本數
Metrics Server 資源指標收集器,提供CPU/記憶體等基礎指標
Custom Metrics 自訂指標,如QPS、佇列深度、連線數
External Metrics 外部指標,如訊息佇列長度、雲端服務指標
Target Utilization 目標利用率,HPA維護指標接近目標值
Scale Target Ref 擴縮目標引用,指向Deployment/StatefulSet等
Behavior 擴縮行為配置,控制擴縮速度和策略
Stabilization Window 穩定視窗,防止指標波動導致頻繁擴縮
Cooldown/Delay 擴縮冷卻時間,兩次擴縮操作的最小間隔
VPA 垂直Pod自動擴縮器,調整Pod資源請求

HPA工作流程

1. HPA控制器每隔15s(預設)從Metrics Server取得指標
2. 計算當前指標值與目標值的比率
3. 根據比率計算期望副本數: desiredReplicas = ceil[currentReplicas * (currentMetric / targetMetric)]
4. 套用Behavior策略限制擴縮速度
5. 更新Scale Target的replicas欄位
6. Deployment控制器建立/刪除Pod

問題分析:HPA生產部署的5大挑戰

  1. 指標延遲:Metrics Server預設30s收集間隔,突發流量時指標滯後導致擴容不及時
  2. 擴縮震盪:指標在閾值附近波動,Pod頻繁建立銷毀,影響服務穩定性
  3. 自訂指標缺失:CPU/記憶體不能真實反映業務負載,需要QPS、佇列深度等業務指標
  4. 縮容雪崩:縮容過快導致剛建立的連線被中斷,請求失敗率飆升
  5. 資源請求不準:Pod的resources.requests設定不合理,HPA基於百分比的計算失真

分步實操:7個關鍵調優策略

策略1:基礎HPA配置——CPU/記憶體指標

apiVersion: autoscaling/v2
kind: HorizontalPodAutoscaler
metadata:
  name: web-api-hpa
  namespace: production
spec:
  scaleTargetRef:
    apiVersion: apps/v1
    kind: Deployment
    name: web-api
  minReplicas: 3
  maxReplicas: 50
  metrics:
    - type: Resource
      resource:
        name: cpu
        target:
          type: Utilization
          averageUtilization: 70
    - type: Resource
      resource:
        name: memory
        target:
          type: Utilization
          averageUtilization: 80
  behavior:
    scaleUp:
      stabilizationWindowSeconds: 0
      selectPolicy: Max
      policies:
        - type: Percent
          value: 100
          periodSeconds: 60
        - type: Pods
          value: 4
          periodSeconds: 60
    scaleDown:
      stabilizationWindowSeconds: 300
      selectPolicy: Min
      policies:
        - type: Percent
          value: 10
          periodSeconds: 60

策略2:自訂指標——Prometheus Adapter

apiVersion: apiregistration.k8s.io/v1
kind: APIService
metadata:
  name: v1beta1.custom.metrics.k8s.io
spec:
  service:
    name: prometheus-adapter
    namespace: monitoring
  group: custom.metrics.k8s.io
  version: v1beta1
  insecureSkipTLSVerify: true
  groupPriorityMinimum: 100
  versionPriority: 100
---
apiVersion: v1
kind: ConfigMap
metadata:
  name: adapter-config
  namespace: monitoring
data:
  config.yaml: |
    rules:
      - seriesQuery: 'http_requests_total{namespace!="",pod!=""}'
        resources:
          overrides:
            namespace: {resource: "namespace"}
            pod: {resource: "pod"}
        name:
          matches: "^(.*)_total"
          as: "${1}_per_second"
        metricsQuery: 'sum(rate(<<.Series>>{<<.LabelMatchers>>}[2m])) by (<<.GroupBy>>)'
---
apiVersion: autoscaling/v2
kind: HorizontalPodAutoscaler
metadata:
  name: web-api-custom-hpa
  namespace: production
spec:
  scaleTargetRef:
    apiVersion: apps/v1
    kind: Deployment
    name: web-api
  minReplicas: 3
  maxReplicas: 100
  metrics:
    - type: Pods
      pods:
        metric:
          name: http_requests_per_second
        target:
          type: AverageValue
          averageValue: "1000"
    - type: Resource
      resource:
        name: cpu
        target:
          type: Utilization
          averageUtilization: 70

策略3:擴縮行為精細化調優

apiVersion: autoscaling/v2
kind: HorizontalPodAutoscaler
metadata:
  name: order-service-hpa
  namespace: production
spec:
  scaleTargetRef:
    apiVersion: apps/v1
    kind: Deployment
    name: order-service
  minReplicas: 5
  maxReplicas: 200
  metrics:
    - type: Pods
      pods:
        metric:
          name: http_requests_per_second
        target:
          type: AverageValue
          averageValue: "500"
  behavior:
    scaleUp:
      stabilizationWindowSeconds: 0
      selectPolicy: Max
      policies:
        - type: Percent
          value: 200
          periodSeconds: 60
        - type: Pods
          value: 10
          periodSeconds: 60
        - type: Percent
          value: 50
          periodSeconds: 120
    scaleDown:
      stabilizationWindowSeconds: 600
      selectPolicy: Min
      policies:
        - type: Percent
          value: 5
          periodSeconds: 120
        - type: Pods
          value: 1
          periodSeconds: 120

策略4:外部指標——訊息佇列深度驅動擴容

apiVersion: autoscaling/v2
kind: HorizontalPodAutoscaler
metadata:
  name: queue-worker-hpa
  namespace: production
spec:
  scaleTargetRef:
    apiVersion: apps/v1
    kind: Deployment
    name: queue-worker
  minReplicas: 2
  maxReplicas: 50
  metrics:
    - type: External
      external:
        metric:
          name: rabbitmq_queue_messages
          selector:
            matchLabels:
              queue: order-processing
        target:
          type: AverageValue
          averageValue: "30"
  behavior:
    scaleUp:
      stabilizationWindowSeconds: 30
      selectPolicy: Max
      policies:
        - type: Pods
          value: 5
          periodSeconds: 60
    scaleDown:
      stabilizationWindowSeconds: 300
      selectPolicy: Min
      policies:
        - type: Pods
          value: 1
          periodSeconds: 120

策略5:多指標組合策略

apiVersion: autoscaling/v2
kind: HorizontalPodAutoscaler
metadata:
  name: api-gateway-hpa
  namespace: production
spec:
  scaleTargetRef:
    apiVersion: apps/v1
    kind: Deployment
    name: api-gateway
  minReplicas: 5
  maxReplicas: 100
  metrics:
    - type: Pods
      pods:
        metric:
          name: http_requests_per_second
        target:
          type: AverageValue
          averageValue: "2000"
    - type: Resource
      resource:
        name: cpu
        target:
          type: Utilization
          averageUtilization: 65
    - type: Resource
      resource:
        name: memory
        target:
          type: Utilization
          averageUtilization: 75
    - type: External
      external:
        metric:
          name: redis_connected_clients
        target:
          type: AverageValue
          averageValue: "100"
  behavior:
    scaleUp:
      stabilizationWindowSeconds: 0
      selectPolicy: Max
      policies:
        - type: Percent
          value: 100
          periodSeconds: 60
    scaleDown:
      stabilizationWindowSeconds: 600
      selectPolicy: Min
      policies:
        - type: Percent
          value: 10
          periodSeconds: 120

策略6:VPA與HPA協同

apiVersion: autoscaling.k8s.io/v1
kind: VerticalPodAutoscaler
metadata:
  name: web-api-vpa
  namespace: production
spec:
  targetRef:
    apiVersion: apps/v1
    kind: Deployment
    name: web-api
  updatePolicy:
    updateMode: Auto
  resourcePolicy:
    containerPolicies:
      - containerName: web-api
        minAllowed:
          cpu: 100m
          memory: 128Mi
        maxAllowed:
          cpu: "4"
          memory: 4Gi
        controlledResources:
          - cpu
          - memory
---
apiVersion: autoscaling/v2
kind: HorizontalPodAutoscaler
metadata:
  name: web-api-hpa
  namespace: production
spec:
  scaleTargetRef:
    apiVersion: apps/v1
    kind: Deployment
    name: web-api
  minReplicas: 3
  maxReplicas: 50
  metrics:
    - type: Pods
      pods:
        metric:
          name: http_requests_per_second
        target:
          type: AverageValue
          averageValue: "1000"

策略7:生產就緒檢查清單

apiVersion: apps/v1
kind: Deployment
metadata:
  name: production-ready-app
  namespace: production
spec:
  replicas: 3
  selector:
    matchLabels:
      app: production-ready-app
  template:
    metadata:
      labels:
        app: production-ready-app
      annotations:
        prometheus.io/scrape: "true"
        prometheus.io/port: "8080"
    spec:
      terminationGracePeriodSeconds: 60
      containers:
        - name: app
          image: app:1.0.0
          ports:
            - containerPort: 8080
          resources:
            requests:
              cpu: 200m
              memory: 256Mi
            limits:
              cpu: "1"
              memory: 1Gi
          readinessProbe:
            httpGet:
              path: /health/ready
              port: 8080
            initialDelaySeconds: 5
            periodSeconds: 10
            failureThreshold: 3
          livenessProbe:
            httpGet:
              path: /health/live
              port: 8080
            initialDelaySeconds: 15
            periodSeconds: 20
            failureThreshold: 3
          lifecycle:
            preStop:
              exec:
                command: ["/bin/sh", "-c", "sleep 15"]
---
apiVersion: autoscaling/v2
kind: HorizontalPodAutoscaler
metadata:
  name: production-ready-app-hpa
  namespace: production
spec:
  scaleTargetRef:
    apiVersion: apps/v1
    kind: Deployment
    name: production-ready-app
  minReplicas: 3
  maxReplicas: 50
  metrics:
    - type: Resource
      resource:
        name: cpu
        target:
          type: Utilization
          averageUtilization: 70
    - type: Pods
      pods:
        metric:
          name: http_requests_per_second
        target:
          type: AverageValue
          averageValue: "500"
  behavior:
    scaleUp:
      stabilizationWindowSeconds: 0
      selectPolicy: Max
      policies:
        - type: Percent
          value: 100
          periodSeconds: 60
        - type: Pods
          value: 4
          periodSeconds: 60
    scaleDown:
      stabilizationWindowSeconds: 600
      selectPolicy: Min
      policies:
        - type: Percent
          value: 10
          periodSeconds: 120

避坑指南

坑1:resources.requests未設定導致HPA無法運作

# ❌ 錯誤:沒有設定requests,HPA無法計算利用率
resources:
  limits:
    cpu: "1"
    memory: 1Gi

# ✅ 正確:必須設定requests
resources:
  requests:
    cpu: 200m
    memory: 256Mi
  limits:
    cpu: "1"
    memory: 1Gi

坑2:縮容穩定視窗太短導致震盪

# ❌ 錯誤:預設穩定視窗0秒,指標波動就縮容
behavior:
  scaleDown:
    stabilizationWindowSeconds: 0

# ✅ 正確:生產環境至少300秒
behavior:
  scaleDown:
    stabilizationWindowSeconds: 600
    selectPolicy: Min
    policies:
      - type: Percent
        value: 10
        periodSeconds: 120

坑3:maxReplicas設定過高導致資源耗盡

# ❌ 錯誤:maxReplicas沒有上限保護
spec:
  maxReplicas: 1000

# ✅ 正確:根據叢集容量設定合理上限,搭配LimitRange和ResourceQuota
spec:
  maxReplicas: 50
---
apiVersion: v1
kind: ResourceQuota
metadata:
  name: compute-quota
  namespace: production
spec:
  hard:
    requests.cpu: "100"
    requests.memory: 200Gi
    limits.cpu: "200"
    limits.memory: 400Gi
    pods: "200"

坑4:Pod沒有readinessProbe導致流量打到未就緒的Pod

# ❌ 錯誤:沒有readinessProbe,新Pod一建立就接收流量
spec:
  containers:
    - name: app
      image: app:1.0.0

# ✅ 正確:設定readinessProbe,確保Pod就緒後才接收流量
spec:
  containers:
    - name: app
      image: app:1.0.0
      readinessProbe:
        httpGet:
          path: /health/ready
          port: 8080
        initialDelaySeconds: 5
        periodSeconds: 10

坑5:VPA和HPA同時使用CPU指標導致衝突

# ❌ 錯誤:VPA和HPA都基於CPU指標,互相干擾
# VPA調整CPU requests → HPA計算利用率變化 → 再次觸發擴縮

# ✅ 正確:HPA用自訂指標,VPA管理資源請求
# HPA: 基於QPS等業務指標
# VPA: 基於CPU/記憶體資源指標
apiVersion: autoscaling/v2
kind: HorizontalPodAutoscaler
metadata:
  name: web-api-hpa
spec:
  metrics:
    - type: Pods
      pods:
        metric:
          name: http_requests_per_second
        target:
          type: AverageValue
          averageValue: "1000"

報錯排查

序號 報錯資訊 原因 解決方法
1 the HPA was unable to compute the replica count Metrics Server未安裝或不可用 安裝Metrics Server,檢查kubectl top pods
2 missing request for cpu Pod未設定resources.requests 為容器新增resources.requests.cpu
3 failed to get cpu utilization 指標收集延遲 等待1-2分鐘,檢查Metrics Server日誌
4 the desired replica count is below the minimum 負載低於minReplicas 正常現象,HPA不會縮容到minReplicas以下
5 the desired replica count is above the maximum 負載超過maxReplicas 增加maxReplicas或最佳化服務效能
6 invalid metrics source 自訂指標API未註冊 安裝Prometheus Adapter,檢查APIService狀態
7 could not resolve external metric 外部指標查詢失敗 檢查指標名稱和selector,確認Prometheus有資料
8 scaling limited because of pod disruption budget PDB阻止縮容 調整PDB的minAvailable或maxUnavailable
9 back-off period: scaling is rate limited 擴縮冷卻期內 等待冷卻期結束,或調整behavior.policies.periodSeconds
10 insufficient quota to scale 命名空間資源配額不足 增加ResourceQuota或減少maxReplicas

進階最佳化

1. 基於預測的彈性伸縮(KEDA)

apiVersion: keda.sh/v1alpha1
kind: ScaledObject
metadata:
  name: order-processor-scaler
  namespace: production
spec:
  scaleTargetRef:
    apiVersion: apps/v1
    kind: Deployment
    name: order-processor
  minReplicaCount: 2
  maxReplicaCount: 50
  cooldownPeriod: 300
  triggers:
    - type: rabbitmq
      metadata:
        queueName: order-processing
        host: amqp://rabbitmq.production.svc:5672
        queueLength: "30"
    - type: prometheus
      metadata:
        serverAddress: http://prometheus.monitoring.svc:9090
        metricName: http_request_duration_seconds_p99
        threshold: "0.5"
        query: "histogram_quantile(0.99, sum(rate(http_request_duration_seconds_bucket{namespace='production'}[5m])) by (le))"

2. Pod優先級與搶佔保護關鍵服務

apiVersion: scheduling.k8s.io/v1
kind: PriorityClass
metadata:
  name: critical-service
value: 1000000
globalDefault: false
description: "關鍵服務優先級"
---
apiVersion: scheduling.k8s.io/v1
kind: PriorityClass
metadata:
  name: batch-job
value: 100
globalDefault: false
description: "批次處理任務優先級"
---
apiVersion: v1
kind: Pod
metadata:
  name: critical-app
spec:
  priorityClassName: critical-service
  containers:
    - name: app
      image: app:1.0.0
      resources:
        requests:
          cpu: 500m
          memory: 512Mi

3. HPA監控告警

apiVersion: monitoring.coreos.com/v1
kind: PrometheusRule
metadata:
  name: hpa-alerts
  namespace: monitoring
spec:
  groups:
    - name: hpa.rules
      rules:
        - alert: HPAAtMaxReplicas
          expr: kube_hpa_status_current_replicas == kube_hpa_spec_max_replicas
          for: 10m
          labels:
            severity: warning
          annotations:
            summary: "HPA {{ $labels.hpa }} has reached max replicas"
        - alert: HPAUnstableScaling
          expr: |
            count_over_time(kube_hpa_status_current_replicas[30m]) > 10
          for: 5m
          labels:
            severity: warning
          annotations:
            summary: "HPA {{ $labels.hpa }} is scaling frequently"
        - alert: HPAMetricsUnavailable
          expr: kube_hpa_status_condition{condition="ScalingLimited",status="true"} == 1
          for: 5m
          labels:
            severity: critical
          annotations:
            summary: "HPA {{ $labels.hpa }} metrics unavailable"

對比分析

維度 HPA VPA KEDA Cluster Autoscaler Knative
擴縮維度 水平(副本數) 垂直(資源大小) 水平+事件驅動 節點數量 水平+縮到零
指標型別 CPU/記憶體/自訂 CPU/記憶體 50+事件來源 節點資源 並行請求數
縮到零
即時性 15s-60s 分鐘級 秒級 分鐘級 秒級
生產成熟度 ✅ GA ✅ GA ✅ CNCF孵化 ✅ GA ✅ GA
複雜度
適用場景 無狀態服務 資源調優 事件驅動 叢集容量 Serverless

總結:HPA不是「配個CPU閾值就完事」,而是「從指標選擇到擴縮行為再到叢集容量的系統工程」。核心原則:用業務指標(QPS/佇列深度)而非資源指標驅動擴容——CPU高是結果不是原因;縮容必須保守——stabilizationWindowSeconds至少300秒,縮容速率不超過10%/2分鐘;VPA管資源請求,HPA管副本數,兩者用不同指標避免衝突;生產環境必須有ResourceQuota兜底,防止HPA無限擴容耗盡叢集資源。


線上工具推薦

本站提供瀏覽器本地工具,免註冊即可試用 →

#Kubernetes#HPA#自动扩缩容#云原生#弹性伸缩#2026#DevOps