K8s HPA自動スケーリング：メトリクス設定から本番安定性までの7つのキーチューニング戦略

K8sクラスタはトラフィックピーク時に紙細工のようだ

午前3時にトラフィック急増、PodがOOM Kill；セール中にHPAが狂ったようにスケールアウト、DB接続プールが瞬時に枯渇；スケールインが激しすぎて、作成されたばかりのPodがウォームアップ前に殺される。Kubernetes HPA（Horizontal Pod Autoscaler）はCPU閾値を設定すれば本番に出せるものではない——デフォルトのメトリクスウィンドウ、スケーリングポリシー、クールダウン時間はすべてデモ用に設計されており、そのまま本番に入れると災害になる。

本記事はHPA基本設定から出発し、メトリクス設定→カスタムメトリクス→スケーリング動作チューニング→本番安定性の7つのキーチューニング戦略をガイドする。

HPAコア概念

概念	説明
Horizontal Pod Autoscaler	水平Podオートスケーラー、メトリクスに基づきPodレプリカ数を自動調整
Metrics Server	リソースメトリクスコレクター、CPU/メモリ等の基本メトリクスを提供
Custom Metrics	カスタムメトリクス、QPS、キュー深度、接続数など
External Metrics	外部メトリクス、メッセージキュー長、クラウドサービスメトリクスなど
Target Utilization	ターゲット利用率、HPAはメトリクスをターゲット値付近に維持
Scale Target Ref	スケーリングターゲット参照、Deployment/StatefulSetなどを指す
Behavior	スケーリング動作設定、スケーリング速度とポリシーを制御
Stabilization Window	安定ウィンドウ、メトリクス変動による頻繁なスケーリングを防止
Cooldown/Delay	スケーリングクールダウン、2回のスケーリング操作の最小間隔
VPA	垂直Podオートスケーラー、Podリソースリクエストを調整

HPAワークフロー

1. HPAコントローラーが15秒ごと（デフォルト）にMetrics Serverからメトリクスを取得
2. 現在のメトリクス値とターゲット値の比率を計算
3. 比率に基づき希望レプリカ数を計算: desiredReplicas = ceil[currentReplicas * (currentMetric / targetMetric)]
4. Behaviorポリシーを適用してスケーリング速度を制限
5. Scale Targetのreplicasフィールドを更新
6. DeploymentコントローラーがPodを作成/削除

問題分析：HPA本番デプロイの5つの課題

メトリクス遅延：Metrics Serverのデフォルト30秒収集間隔、トラフィックバースト時のメトリクスラグでスケールアウトが遅れる
スケーリング発振：メトリクスが閾値付近で変動、Podが頻繁に作成/削除され、サービス安定性に影響
カスタムメトリクス欠落：CPU/メモリはビジネス負荷を真に反映できない、QPS、キュー深度等のビジネスメトリクスが必要
スケールイン雪崩：スケールインが速すぎて新しく確立された接続が中断、リクエスト失敗率が急上昇
リソースリクエスト不正確：Podのresources.requestsの設定が不合理、HPAのパーセントベース計算が歪む

ステップバイステップ：7つのキーチューニング戦略

戦略1：基本HPA設定——CPU/メモリメトリクス

apiVersion: autoscaling/v2
kind: HorizontalPodAutoscaler
metadata:
  name: web-api-hpa
  namespace: production
spec:
  scaleTargetRef:
    apiVersion: apps/v1
    kind: Deployment
    name: web-api
  minReplicas: 3
  maxReplicas: 50
  metrics:
    - type: Resource
      resource:
        name: cpu
        target:
          type: Utilization
          averageUtilization: 70
    - type: Resource
      resource:
        name: memory
        target:
          type: Utilization
          averageUtilization: 80
  behavior:
    scaleUp:
      stabilizationWindowSeconds: 0
      selectPolicy: Max
      policies:
        - type: Percent
          value: 100
          periodSeconds: 60
        - type: Pods
          value: 4
          periodSeconds: 60
    scaleDown:
      stabilizationWindowSeconds: 300
      selectPolicy: Min
      policies:
        - type: Percent
          value: 10
          periodSeconds: 60

戦略2：カスタムメトリクス——Prometheus Adapter

apiVersion: apiregistration.k8s.io/v1
kind: APIService
metadata:
  name: v1beta1.custom.metrics.k8s.io
spec:
  service:
    name: prometheus-adapter
    namespace: monitoring
  group: custom.metrics.k8s.io
  version: v1beta1
  insecureSkipTLSVerify: true
  groupPriorityMinimum: 100
  versionPriority: 100
---
apiVersion: v1
kind: ConfigMap
metadata:
  name: adapter-config
  namespace: monitoring
data:
  config.yaml: |
    rules:
      - seriesQuery: 'http_requests_total{namespace!="",pod!=""}'
        resources:
          overrides:
            namespace: {resource: "namespace"}
            pod: {resource: "pod"}
        name:
          matches: "^(.*)_total"
          as: "${1}_per_second"
        metricsQuery: 'sum(rate(<<.Series>>{<<.LabelMatchers>>}[2m])) by (<<.GroupBy>>)'
---
apiVersion: autoscaling/v2
kind: HorizontalPodAutoscaler
metadata:
  name: web-api-custom-hpa
  namespace: production
spec:
  scaleTargetRef:
    apiVersion: apps/v1
    kind: Deployment
    name: web-api
  minReplicas: 3
  maxReplicas: 100
  metrics:
    - type: Pods
      pods:
        metric:
          name: http_requests_per_second
        target:
          type: AverageValue
          averageValue: "1000"
    - type: Resource
      resource:
        name: cpu
        target:
          type: Utilization
          averageUtilization: 70

戦略3：スケーリング動作のきめ細かいチューニング

apiVersion: autoscaling/v2
kind: HorizontalPodAutoscaler
metadata:
  name: order-service-hpa
  namespace: production
spec:
  scaleTargetRef:
    apiVersion: apps/v1
    kind: Deployment
    name: order-service
  minReplicas: 5
  maxReplicas: 200
  metrics:
    - type: Pods
      pods:
        metric:
          name: http_requests_per_second
        target:
          type: AverageValue
          averageValue: "500"
  behavior:
    scaleUp:
      stabilizationWindowSeconds: 0
      selectPolicy: Max
      policies:
        - type: Percent
          value: 200
          periodSeconds: 60
        - type: Pods
          value: 10
          periodSeconds: 60
        - type: Percent
          value: 50
          periodSeconds: 120
    scaleDown:
      stabilizationWindowSeconds: 600
      selectPolicy: Min
      policies:
        - type: Percent
          value: 5
          periodSeconds: 120
        - type: Pods
          value: 1
          periodSeconds: 120

戦略4：外部メトリクス——メッセージキュー深度駆動スケーリング

apiVersion: autoscaling/v2
kind: HorizontalPodAutoscaler
metadata:
  name: queue-worker-hpa
  namespace: production
spec:
  scaleTargetRef:
    apiVersion: apps/v1
    kind: Deployment
    name: queue-worker
  minReplicas: 2
  maxReplicas: 50
  metrics:
    - type: External
      external:
        metric:
          name: rabbitmq_queue_messages
          selector:
            matchLabels:
              queue: order-processing
        target:
          type: AverageValue
          averageValue: "30"
  behavior:
    scaleUp:
      stabilizationWindowSeconds: 30
      selectPolicy: Max
      policies:
        - type: Pods
          value: 5
          periodSeconds: 60
    scaleDown:
      stabilizationWindowSeconds: 300
      selectPolicy: Min
      policies:
        - type: Pods
          value: 1
          periodSeconds: 120

戦略5：マルチメトリクス組み合わせ戦略

apiVersion: autoscaling/v2
kind: HorizontalPodAutoscaler
metadata:
  name: api-gateway-hpa
  namespace: production
spec:
  scaleTargetRef:
    apiVersion: apps/v1
    kind: Deployment
    name: api-gateway
  minReplicas: 5
  maxReplicas: 100
  metrics:
    - type: Pods
      pods:
        metric:
          name: http_requests_per_second
        target:
          type: AverageValue
          averageValue: "2000"
    - type: Resource
      resource:
        name: cpu
        target:
          type: Utilization
          averageUtilization: 65
    - type: Resource
      resource:
        name: memory
        target:
          type: Utilization
          averageUtilization: 75
    - type: External
      external:
        metric:
          name: redis_connected_clients
        target:
          type: AverageValue
          averageValue: "100"
  behavior:
    scaleUp:
      stabilizationWindowSeconds: 0
      selectPolicy: Max
      policies:
        - type: Percent
          value: 100
          periodSeconds: 60
    scaleDown:
      stabilizationWindowSeconds: 600
      selectPolicy: Min
      policies:
        - type: Percent
          value: 10
          periodSeconds: 120

戦略6：VPAとHPAの連携

apiVersion: autoscaling.k8s.io/v1
kind: VerticalPodAutoscaler
metadata:
  name: web-api-vpa
  namespace: production
spec:
  targetRef:
    apiVersion: apps/v1
    kind: Deployment
    name: web-api
  updatePolicy:
    updateMode: Auto
  resourcePolicy:
    containerPolicies:
      - containerName: web-api
        minAllowed:
          cpu: 100m
          memory: 128Mi
        maxAllowed:
          cpu: "4"
          memory: 4Gi
        controlledResources:
          - cpu
          - memory
---
apiVersion: autoscaling/v2
kind: HorizontalPodAutoscaler
metadata:
  name: web-api-hpa
  namespace: production
spec:
  scaleTargetRef:
    apiVersion: apps/v1
    kind: Deployment
    name: web-api
  minReplicas: 3
  maxReplicas: 50
  metrics:
    - type: Pods
      pods:
        metric:
          name: http_requests_per_second
        target:
          type: AverageValue
          averageValue: "1000"

戦略7：本番対応チェックリスト

apiVersion: apps/v1
kind: Deployment
metadata:
  name: production-ready-app
  namespace: production
spec:
  replicas: 3
  selector:
    matchLabels:
      app: production-ready-app
  template:
    metadata:
      labels:
        app: production-ready-app
      annotations:
        prometheus.io/scrape: "true"
        prometheus.io/port: "8080"
    spec:
      terminationGracePeriodSeconds: 60
      containers:
        - name: app
          image: app:1.0.0
          ports:
            - containerPort: 8080
          resources:
            requests:
              cpu: 200m
              memory: 256Mi
            limits:
              cpu: "1"
              memory: 1Gi
          readinessProbe:
            httpGet:
              path: /health/ready
              port: 8080
            initialDelaySeconds: 5
            periodSeconds: 10
            failureThreshold: 3
          livenessProbe:
            httpGet:
              path: /health/live
              port: 8080
            initialDelaySeconds: 15
            periodSeconds: 20
            failureThreshold: 3
          lifecycle:
            preStop:
              exec:
                command: ["/bin/sh", "-c", "sleep 15"]
---
apiVersion: autoscaling/v2
kind: HorizontalPodAutoscaler
metadata:
  name: production-ready-app-hpa
  namespace: production
spec:
  scaleTargetRef:
    apiVersion: apps/v1
    kind: Deployment
    name: production-ready-app
  minReplicas: 3
  maxReplicas: 50
  metrics:
    - type: Resource
      resource:
        name: cpu
        target:
          type: Utilization
          averageUtilization: 70
    - type: Pods
      pods:
        metric:
          name: http_requests_per_second
        target:
          type: AverageValue
          averageValue: "500"
  behavior:
    scaleUp:
      stabilizationWindowSeconds: 0
      selectPolicy: Max
      policies:
        - type: Percent
          value: 100
          periodSeconds: 60
        - type: Pods
          value: 4
          periodSeconds: 60
    scaleDown:
      stabilizationWindowSeconds: 600
      selectPolicy: Min
      policies:
        - type: Percent
          value: 10
          periodSeconds: 120

落とし穴ガイド

落とし穴1：resources.requests未設定でHPAが動作しない

# ❌ 誤り：requestsがなく、HPAが利用率を計算できない
resources:
  limits:
    cpu: "1"
    memory: 1Gi

# ✅ 正しい：requestsを必ず設定
resources:
  requests:
    cpu: 200m
    memory: 256Mi
  limits:
    cpu: "1"
    memory: 1Gi

落とし穴2：スケールイン安定ウィンドウが短すぎて発振

# ❌ 誤り：デフォルト0秒の安定ウィンドウ、メトリクス変動で即スケールイン
behavior:
  scaleDown:
    stabilizationWindowSeconds: 0

# ✅ 正しい：本番では少なくとも300秒
behavior:
  scaleDown:
    stabilizationWindowSeconds: 600
    selectPolicy: Min
    policies:
      - type: Percent
        value: 10
        periodSeconds: 120

落とし穴3：maxReplicasが高すぎてリソース枯渇

# ❌ 誤り：上限保護なし
spec:
  maxReplicas: 1000

# ✅ 正しい：クラスタ容量に基づき合理的な上限を設定、LimitRangeとResourceQuotaと組み合わせ
spec:
  maxReplicas: 50
---
apiVersion: v1
kind: ResourceQuota
metadata:
  name: compute-quota
  namespace: production
spec:
  hard:
    requests.cpu: "100"
    requests.memory: 200Gi
    limits.cpu: "200"
    limits.memory: 400Gi
    pods: "200"

落とし穴4：readinessProbeなしで未準備Podにトラフィックが流れる

# ❌ 誤り：readinessProbeなし、新Podが即座にトラフィックを受信
spec:
  containers:
    - name: app
      image: app:1.0.0

# ✅ 正しい：readinessProbeを設定、Pod準備完了後にトラフィックを受信
spec:
  containers:
    - name: app
      image: app:1.0.0
      readinessProbe:
        httpGet:
          path: /health/ready
          port: 8080
        initialDelaySeconds: 5
        periodSeconds: 10

落とし穴5：VPAとHPAが同じCPUメトリクスを使用して競合

# ❌ 誤り：VPAとHPAが両方ともCPUメトリクスベース、互いに干渉
# VPAがCPU requestsを調整 → HPAが利用率を再計算 → 再度スケーリングをトリガー

# ✅ 正しい：HPAはカスタムメトリクス、VPAはリソースリクエストを管理
# HPA: QPS等のビジネスメトリクスベース
# VPA: CPU/メモリリソースメトリクスベース
apiVersion: autoscaling/v2
kind: HorizontalPodAutoscaler
metadata:
  name: web-api-hpa
spec:
  metrics:
    - type: Pods
      pods:
        metric:
          name: http_requests_per_second
        target:
          type: AverageValue
          averageValue: "1000"

エラートラブルシューティング

#	エラーメッセージ	原因	解決方法
1	`the HPA was unable to compute the replica count`	Metrics Server未インストールまたは利用不可	Metrics Serverをインストール、kubectl top podsを確認
2	`missing request for cpu`	Podにresources.requestsがない	コンテナにresources.requests.cpuを追加
3	`failed to get cpu utilization`	メトリクス収集遅延	1-2分待機、Metrics Serverログを確認
4	`the desired replica count is below the minimum`	負荷がminReplicas未満	正常な動作、HPAはminReplicas以下にスケールインしない
5	`the desired replica count is above the maximum`	負荷がmaxReplicas超過	maxReplicasを増やすかサービスパフォーマンスを最適化
6	`invalid metrics source`	カスタムメトリクスAPI未登録	Prometheus Adapterをインストール、APIServiceステータスを確認
7	`could not resolve external metric`	外部メトリクスクエリ失敗	メトリクス名とselectorを確認、Prometheusにデータがあるか
8	`scaling limited because of pod disruption budget`	PDBがスケールインをブロック	PDBのminAvailableまたはmaxUnavailableを調整
9	`back-off period: scaling is rate limited`	スケーリングクールダウン中	クールダウン終了を待つ、またはbehavior.policies.periodSecondsを調整
10	`insufficient quota to scale`	ネームスペースリソースクォータ不足	ResourceQuotaを増やすかmaxReplicasを減らす

高度な最適化

1. 予測ベースのオートスケーリング（KEDA）

apiVersion: keda.sh/v1alpha1
kind: ScaledObject
metadata:
  name: order-processor-scaler
  namespace: production
spec:
  scaleTargetRef:
    apiVersion: apps/v1
    kind: Deployment
    name: order-processor
  minReplicaCount: 2
  maxReplicaCount: 50
  cooldownPeriod: 300
  triggers:
    - type: rabbitmq
      metadata:
        queueName: order-processing
        host: amqp://rabbitmq.production.svc:5672
        queueLength: "30"
    - type: prometheus
      metadata:
        serverAddress: http://prometheus.monitoring.svc:9090
        metricName: http_request_duration_seconds_p99
        threshold: "0.5"
        query: "histogram_quantile(0.99, sum(rate(http_request_duration_seconds_bucket{namespace='production'}[5m])) by (le))"

2. Pod優先度とプリエンプションでクリティカルサービスを保護

apiVersion: scheduling.k8s.io/v1
kind: PriorityClass
metadata:
  name: critical-service
value: 1000000
globalDefault: false
description: "クリティカルサービス優先度"
---
apiVersion: scheduling.k8s.io/v1
kind: PriorityClass
metadata:
  name: batch-job
value: 100
globalDefault: false
description: "バッチジョブ優先度"
---
apiVersion: v1
kind: Pod
metadata:
  name: critical-app
spec:
  priorityClassName: critical-service
  containers:
    - name: app
      image: app:1.0.0
      resources:
        requests:
          cpu: 500m
          memory: 512Mi

3. HPAモニタリングとアラート

apiVersion: monitoring.coreos.com/v1
kind: PrometheusRule
metadata:
  name: hpa-alerts
  namespace: monitoring
spec:
  groups:
    - name: hpa.rules
      rules:
        - alert: HPAAtMaxReplicas
          expr: kube_hpa_status_current_replicas == kube_hpa_spec_max_replicas
          for: 10m
          labels:
            severity: warning
          annotations:
            summary: "HPA {{ $labels.hpa }} has reached max replicas"
        - alert: HPAUnstableScaling
          expr: |
            count_over_time(kube_hpa_status_current_replicas[30m]) > 10
          for: 5m
          labels:
            severity: warning
          annotations:
            summary: "HPA {{ $labels.hpa }} is scaling frequently"
        - alert: HPAMetricsUnavailable
          expr: kube_hpa_status_condition{condition="ScalingLimited",status="true"} == 1
          for: 5m
          labels:
            severity: critical
          annotations:
            summary: "HPA {{ $labels.hpa }} metrics unavailable"

比較分析

次元	HPA	VPA	KEDA	Cluster Autoscaler	Knative
スケーリング次元	水平（レプリカ数）	垂直（リソースサイズ）	水平+イベント駆動	ノード数	水平+ゼロスケール
メトリクスタイプ	CPU/メモリ/カスタム	CPU/メモリ	50+イベントソース	ノードリソース	同時リクエスト数
ゼロスケール	❌	❌	✅	❌	✅
リアルタイム性	15s-60s	分単位	秒単位	分単位	秒単位
本番成熟度	✅ GA	✅ GA	✅ CNCF Incubating	✅ GA	✅ GA
複雑さ	低	中	中	高	高
ユースケース	ステートレスサービス	リソースチューニング	イベント駆動	クラスタ容量	サーバーレス

まとめ：HPAは「CPU閾値を設定すれば終わり」ではなく、「メトリクス選択からスケーリング動作、クラスタ容量までのシステムエンジニアリング」です。コア原則：ビジネスメトリクス（QPS/キュー深度）でスケーリングを駆動し、リソースメトリクスではなく——CPU高は結果であって原因ではない；スケールインは保守的に——stabilizationWindowSecondsは最低300秒、スケールインレートは10%/2分以下；VPAはリソースリクエストを管理、HPAはレプリカ数を管理、異なるメトリクスで競合を回避；本番環境ではResourceQuotaでセーフティネットを必ず設け、HPAが無限にスケールアウトしてクラスタリソースを枯渇させるのを防ぐ。

オンラインツール推奨

JSONフォーマッター：/ja/json/format
Base64エンコード/デコード：/ja/encode/base64
Hash計算：/ja/encode/hash
JWTデコード：/ja/encode/jwt-decode