Python FastAPI生產部署：從Docker到K8s的7個關鍵實戰策略

FastAPI開發爽，部署火葬場

本地跑著飛快，上線就崩——記憶體洩漏、請求超時、Pod被OOM Kill、健康檢查失敗導致滾動更新卡死。你用uvicorn main:app啟動，單程序扛不住併發；加上Gunicorn，worker數量調不好反而更慢；上了K8s，readinessProbe設錯導致流量全丟。2026年，FastAPI生產部署依然是Python後端最容易翻車的環節。

本文將從7個關鍵策略出發，帶你完成Uvicorn設定→Gunicorn調優→Docker最佳化→K8s部署→健康檢查→中介軟體→可觀測性的全鏈路實戰。

FastAPI生產部署核心概念

概念	說明
Uvicorn	ASGI伺服器，基於uvloop和httptools的高效能非同步伺服器
Gunicorn	WSGI/ASGI應用伺服器，管理多個worker程序
Worker	Gunicorn的工作程序，每個程序執行一個Uvicorn實例
uvloop	基於libuv的事件迴圈，替代asyncio預設事件迴圈，效能提升2-4倍
ASGI	非同步伺服器閘道介面，FastAPI的執行協定
健康檢查	K8s透過liveness/readiness探針判斷應用狀態
優雅關閉	收到SIGTERM後完成進行中的請求再退出，避免請求遺失
限流	限制單位時間內的請求數，防止服務過載

問題分析：FastAPI生產部署的5大挑戰

併發模型選擇：非同步IO vs 多程序，worker數量如何確定
Docker映像體積：基礎映像選擇、依賴安裝、多階段建構
K8s資源規劃：CPU/記憶體請求與限制、HPA自動擴縮容
健康檢查設定：liveness與readiness探針的閾值和路徑設計
可觀測性：日誌結構化、分散式追蹤、指標採集的統一方案

分步實操：7個關鍵實戰策略

策略1：Uvicorn生產設定

import uvicorn
from app.main import app

if __name__ == "__main__":
    uvicorn.run(
        "app.main:app",
        host="0.0.0.0",
        port=8000,
        workers=4,
        loop="uvloop",
        http="httptools",
        log_level="info",
        access_log=True,
        use_colors=False,
        proxy_headers=True,
        forwarded_allow_ips="*",
        timeout_keep_alive=5,
        limit_concurrency=1000,
        backlog=2048,
    )

# app/main.py
from fastapi import FastAPI
from fastapi.middleware.cors import CORSMiddleware
from contextlib import asynccontextmanager
import logging

logger = logging.getLogger("app")

@asynccontextmanager
async def lifespan(app: FastAPI):
    logger.info("Application starting up...")
    yield
    logger.info("Application shutting down...")

app = FastAPI(
    title="My API",
    version="1.0.0",
    lifespan=lifespan,
    docs_url="/docs",
    redoc_url="/redoc",
)

app.add_middleware(
    CORSMiddleware,
    allow_origins=["https://example.com"],
    allow_credentials=True,
    allow_methods=["*"],
    allow_headers=["*"],
)

@app.get("/health")
async def health_check():
    return {"status": "healthy"}

@app.get("/ready")
async def readiness_check():
    return {"status": "ready"}

策略2：Gunicorn + Uvicorn Worker

# gunicorn.conf.py
import multiprocessing
import os

bind = "0.0.0.0:8000"
workers = int(os.getenv("GUNICORN_WORKERS", multiprocessing.cpu_count() * 2 + 1))
worker_class = "uvicorn.workers.UvicornWorker"
keepalive = 5
timeout = 120
graceful_timeout = 30
max_requests = 5000
max_requests_jitter = 500
preload_app = True
accesslog = "-"
access_log_format = '%(h)s %(l)s %(u)s %(t)s "%(r)s" %(s)s %(b)s "%(f)s" "%(a)s" %(D)s'
errorlog = "-"
loglevel = "info"

# 啟動命令
gunicorn app.main:app -c gunicorn.conf.py

策略3：Docker多階段建構最佳化

# Dockerfile
FROM python:3.12-slim AS builder

WORKDIR /app

RUN apt-get update && apt-get install -y --no-install-recommends \
    build-essential \
    && rm -rf /var/lib/apt/lists/*

COPY requirements.txt .
RUN pip install --no-cache-dir --prefix=/install -r requirements.txt

FROM python:3.12-slim AS runtime

WORKDIR /app

RUN groupadd -r appuser && useradd -r -g appuser appuser

COPY --from=builder /install /usr/local
COPY . .

RUN chown -R appuser:appuser /app

USER appuser

EXPOSE 8000

HEALTHCHECK --interval=30s --timeout=5s --start-period=10s --retries=3 \
    CMD python -c "import urllib.request; urllib.request.urlopen('http://localhost:8000/health')" || exit 1

CMD ["gunicorn", "app.main:app", "-c", "gunicorn.conf.py"]

策略4：K8s Deployment完整設定

# k8s/deployment.yaml
apiVersion: apps/v1
kind: Deployment
metadata:
  name: fastapi-app
  labels:
    app: fastapi-app
spec:
  replicas: 3
  selector:
    matchLabels:
      app: fastapi-app
  strategy:
    type: RollingUpdate
    rollingUpdate:
      maxSurge: 1
      maxUnavailable: 0
  template:
    metadata:
      labels:
        app: fastapi-app
    spec:
      terminationGracePeriodSeconds: 60
      containers:
        - name: fastapi-app
          image: myregistry.com/fastapi-app:latest
          ports:
            - containerPort: 8000
              protocol: TCP
          env:
            - name: GUNICORN_WORKERS
              value: "4"
            - name: DATABASE_URL
              valueFrom:
                secretKeyRef:
                  name: app-secrets
                  key: database-url
          resources:
            requests:
              cpu: "250m"
              memory: "256Mi"
            limits:
              cpu: "1000m"
              memory: "512Mi"
          livenessProbe:
            httpGet:
              path: /health
              port: 8000
            initialDelaySeconds: 15
            periodSeconds: 20
            timeoutSeconds: 5
            failureThreshold: 3
          readinessProbe:
            httpGet:
              path: /ready
              port: 8000
            initialDelaySeconds: 5
            periodSeconds: 10
            timeoutSeconds: 3
            failureThreshold: 3
          lifecycle:
            preStop:
              exec:
                command: ["/bin/sh", "-c", "sleep 5"]

# k8s/hpa.yaml
apiVersion: autoscaling/v2
kind: HorizontalPodAutoscaler
metadata:
  name: fastapi-app-hpa
spec:
  scaleTargetRef:
    apiVersion: apps/v1
    kind: Deployment
    name: fastapi-app
  minReplicas: 3
  maxReplicas: 10
  metrics:
    - type: Resource
      resource:
        name: cpu
        target:
          type: Utilization
          averageUtilization: 70
  behavior:
    scaleUp:
      stabilizationWindowSeconds: 30
    scaleDown:
      stabilizationWindowSeconds: 300

策略5：健康檢查與優雅關閉

# app/health.py
import asyncio
from fastapi import APIRouter, Response
from app.database import check_db_connection
from app.cache import check_redis_connection

router = APIRouter()

is_shutting_down = False

@router.get("/health")
async def liveness(response: Response):
    if is_shutting_down:
        response.status_code = 503
        return {"status": "shutting_down"}
    return {"status": "healthy"}

@router.get("/ready")
async def readiness(response: Response):
    if is_shutting_down:
        response.status_code = 503
        return {"status": "shutting_down"}

    checks = {
        "database": await check_db_connection(),
        "redis": await check_redis_connection(),
    }

    all_healthy = all(checks.values())
    if not all_healthy:
        response.status_code = 503
        return {"status": "not_ready", "checks": checks}

    return {"status": "ready", "checks": checks}

import signal

def setup_graceful_shutdown():
    def shutdown_handler(signum, frame):
        global is_shutting_down
        is_shutting_down = True

    signal.signal(signal.SIGTERM, shutdown_handler)
    signal.signal(signal.SIGINT, shutdown_handler)

策略6：限流與中介軟體

# app/middleware/rate_limit.py
import time
from fastapi import Request, Response, HTTPException
from starlette.middleware.base import BaseHTTPMiddleware
from typing import Dict, Tuple

class RateLimitMiddleware(BaseHTTPMiddleware):
    def __init__(self, app, requests_per_minute: int = 60, burst: int = 10):
        super().__init__(app)
        self.requests_per_minute = requests_per_minute
        self.burst = burst
        self._clients: Dict[str, Tuple[int, float]] = {}

    async def dispatch(self, request: Request, call_next):
        client_ip = request.client.host if request.client else "unknown"

        if client_ip not in self._clients:
            self._clients[client_ip] = (1, time.time())
        else:
            count, window_start = self._clients[client_ip]
            elapsed = time.time() - window_start

            if elapsed > 60:
                self._clients[client_ip] = (1, time.time())
            else:
                if count >= self.requests_per_minute:
                    raise HTTPException(
                        status_code=429,
                        detail="Too many requests",
                        headers={"Retry-After": str(int(60 - elapsed))},
                    )
                self._clients[client_ip] = (count + 1, window_start)

        response = await call_next(request)
        return response

# app/middleware/logging.py
import time
import json
import logging
from fastapi import Request
from starlette.middleware.base import BaseHTTPMiddleware

logger = logging.getLogger("app.access")

class LoggingMiddleware(BaseHTTPMiddleware):
    async def dispatch(self, request: Request, call_next):
        start_time = time.time()

        response = await call_next(request)

        duration_ms = (time.time() - start_time) * 1000

        log_entry = {
            "method": request.method,
            "path": request.url.path,
            "status_code": response.status_code,
            "duration_ms": round(duration_ms, 2),
            "client_ip": request.client.host if request.client else None,
        }

        if response.status_code >= 500:
            logger.error(json.dumps(log_entry))
        elif response.status_code >= 400:
            logger.warning(json.dumps(log_entry))
        else:
            logger.info(json.dumps(log_entry))

        response.headers["X-Process-Time"] = f"{duration_ms:.2f}ms"
        return response

策略7：可觀測性（OpenTelemetry）

# app/telemetry.py
from opentelemetry import trace
from opentelemetry.sdk.trace import TracerProvider
from opentelemetry.sdk.trace.export import BatchSpanProcessor
from opentelemetry.exporter.otlp.proto.grpc.trace_exporter import OTLPSpanExporter
from opentelemetry.instrumentation.fastapi import FastAPIInstrumentor
from opentelemetry.instrumentation.redis import RedisInstrumentor
from opentelemetry.instrumentation.sqlalchemy import SQLAlchemyInstrumentor
from opentelemetry.sdk.resources import Resource

def setup_telemetry(app, service_name: str = "fastapi-app", otlp_endpoint: str = "http://otel-collector:4317"):
    resource = Resource.create({"service.name": service_name})

    provider = TracerProvider(resource=resource)
    provider.add_span_processor(
        BatchSpanProcessor(OTLPSpanExporter(endpoint=otlp_endpoint, insecure=True))
    )
    trace.set_tracer_provider(provider)

    FastAPIInstrumentor.instrument_app(app)
    RedisInstrumentor().instrument()
    SQLAlchemyInstrumentor().instrument()

避坑指南

坑1：單worker跑生產

# ❌ 錯誤：單程序無法利用多核
uvicorn app.main:app --host 0.0.0.0 --port 8000

# ✅ 正確：Gunicorn + 多Uvicorn Worker
gunicorn app.main:app -c gunicorn.conf.py
# workers = cpu_count * 2 + 1

坑2：Docker映像用python:latest

# ❌ 錯誤：latest標籤不可控，映像體積1GB+
FROM python:latest

# ✅ 正確：固定slim版本 + 多階段建構
FROM python:3.12-slim AS builder
# ... 建構階段
FROM python:3.12-slim AS runtime
# ... 執行階段，約150MB

坑3：K8s資源限制設太小

# ❌ 錯誤：記憶體限制太低，OOM Kill頻繁
resources:
  limits:
    memory: "128Mi"

# ✅ 正確：合理設定requests和limits
resources:
  requests:
    cpu: "250m"
    memory: "256Mi"
  limits:
    cpu: "1000m"
    memory: "512Mi"

坑4：readinessProbe和livenessProbe用同一個路徑

# ❌ 錯誤：資料庫連不上時livenessProbe也失敗，Pod被重啟
livenessProbe:
  httpGet:
    path: /ready
readinessProbe:
  httpGet:
    path: /ready

# ✅ 正確：liveness只檢查程序存活，readiness檢查依賴
livenessProbe:
  httpGet:
    path: /health
readinessProbe:
  httpGet:
    path: /ready

坑5：忽略SIGTERM訊號

# ❌ 錯誤：收到SIGTERM直接退出，進行中的請求遺失
import sys
signal.signal(signal.SIGTERM, lambda s, f: sys.exit(0))

# ✅ 正確：Gunicorn預設處理SIGTERM優雅關閉
# 配合K8s的terminationGracePeriodSeconds和preStop hook
# gunicorn.conf.py 中設定 graceful_timeout = 30

報錯排查

序號	報錯訊息	原因	解決方法
1	`Worker failed to boot`	應用匯入錯誤或依賴缺失	檢查import語句，確認所有依賴已安裝
2	`OOMKilled`	記憶體超限	增大memory limits，檢查記憶體洩漏
3	`CrashLoopBackOff`	容器啟動後立即崩潰	檢視Pod日誌：kubectl logs --previous
4	`Readiness probe failed`	依賴服務不可用	檢查資料庫/Redis連線，增大initialDelaySeconds
5	`Liveness probe failed`	事件迴圈阻塞	檢查同步阻塞呼叫，使用run_in_executor
6	`429 Too Many Requests`	限流觸發	調整限流閾值，檢查是否有異常流量
7	`Connection pool exhausted`	資料庫連線池耗盡	增大pool_size，檢查連線洩漏
8	`Timeout waiting for response`	請求處理超時	增大Gunicorn timeout，最佳化慢查詢
9	`ImagePullBackOff`	映像拉取失敗	檢查映像名和registry存取權限
10	`Permission denied`	容器內檔案許可權錯誤	確保USER指令正確，檢查volume掛載許可權

進階最佳化

1. 非同步資料庫連線池

# app/database.py
from sqlalchemy.ext.asyncio import create_async_engine, AsyncSession, async_sessionmaker

engine = create_async_engine(
    "postgresql+asyncpg://user:pass@db:5432/mydb",
    pool_size=20,
    max_overflow=10,
    pool_timeout=30,
    pool_recycle=3600,
    pool_pre_ping=True,
)

AsyncSessionLocal = async_sessionmaker(
    engine,
    class_=AsyncSession,
    expire_on_commit=False,
)

async def get_db():
    async with AsyncSessionLocal() as session:
        try:
            yield session
        finally:
            await session.close()

async def check_db_connection() -> bool:
    try:
        async with engine.connect() as conn:
            await conn.execute(text("SELECT 1"))
        return True
    except Exception:
        return False

2. Redis快取層

# app/cache.py
import redis.asyncio as redis
import json
from typing import Optional, Any

class RedisCache:
    def __init__(self, url: str = "redis://redis:6379/0"):
        self.url = url
        self._client: Optional[redis.Redis] = None

    async def connect(self):
        self._client = redis.from_url(
            self.url,
            max_connections=50,
            decode_responses=True,
            socket_timeout=5,
            retry_on_timeout=True,
        )

    async def disconnect(self):
        if self._client:
            await self._client.close()

    async def get(self, key: str) -> Optional[Any]:
        value = await self._client.get(key)
        if value:
            return json.loads(value)
        return None

    async def set(self, key: str, value: Any, ttl: int = 300):
        await self._client.set(key, json.dumps(value), ex=ttl)

cache = RedisCache()

async def check_redis_connection() -> bool:
    try:
        if cache._client:
            return await cache._client.ping()
        return False
    except Exception:
        return False

3. Prometheus指標採集

# app/metrics.py
from prometheus_client import Counter, Histogram, Gauge, generate_latest
from fastapi import APIRouter, Response

REQUEST_COUNT = Counter(
    "http_requests_total",
    "Total HTTP requests",
    ["method", "endpoint", "status_code"],
)

REQUEST_DURATION = Histogram(
    "http_request_duration_seconds",
    "HTTP request duration in seconds",
    ["method", "endpoint"],
    buckets=[0.01, 0.05, 0.1, 0.25, 0.5, 1.0, 2.5, 5.0, 10.0],
)

router = APIRouter()

@router.get("/metrics")
async def metrics():
    return Response(
        content=generate_latest(),
        media_type="text/plain",
    )

對比分析

維度	Uvicorn單程序	Gunicorn+Uvicorn	Daphne	Hypercorn	Uvicorn+K8s
多核利用	❌單核	✅多worker	✅多程序	✅多worker	✅多Pod
非同步支援	✅原生	✅原生	✅原生	✅HTTP/2	✅原生
生產就緒	⚠️需配合	✅	✅	✅	✅
優雅關閉	⚠️基礎	✅完善	✅	✅	✅+preStop
自動擴縮	❌	❌	❌	❌	✅HPA
滾動更新	❌	❌	❌	❌	✅
資源隔離	❌	⚠️程序級	⚠️程序級	⚠️程序級	✅容器級
故障恢復	❌	⚠️需supervisor	⚠️	⚠️	✅K8s自愈

總結：FastAPI生產部署不是「一個命令」的事，而是「一套體系」的工程。從Uvicorn到K8s，核心原則只有三條：多程序利用多核、健康檢查區分存活與就緒、優雅關閉避免請求遺失。Docker多階段建構把映像從1GB壓到150MB，Gunicorn的graceful_timeout配合K8s的terminationGracePeriodSeconds實現零停機更新。記住：開發環境用uvicorn --reload，生產環境用gunicorn -c gunicorn.conf.py，永遠不要把開發設定帶到生產。

線上工具推薦

JSON格式化：/zh-TW/json/format
Base64編解碼：/zh-TW/encode/base64
Hash計算：/zh-TW/encode/hash
JWT解碼：/zh-TW/encode/jwt-decode