DevOps CI/CD 流水线实战:Docker + Kubernetes 全链路

DevOps

CI/CD 基础与 2026 技术格局

CI/CD(持续集成/持续部署)是 DevOps 的核心实践,目标是将代码从提交到生产环境的整个过程自动化、可追溯、可回滚

核心概念

概念 全称 核心目标
CI Continuous Integration 频繁合并代码,自动构建+测试,尽早发现问题
CD(交付) Continuous Delivery 代码随时可部署到生产,需人工审批
CD(部署) Continuous Deployment 代码通过测试后自动部署到生产,无需人工干预

2026 主流 CI/CD 平台对比

平台 适用场景 核心优势 Pipeline 定义
GitHub Actions 开源项目、中小团队 原生集成、Marketplace 生态、免费额度大 .github/workflows/*.yml
GitLab CI 企业私有化、自托管 内置容器镜像库、安全扫描、K8s 集成 .gitlab-ci.yml
Jenkins 复杂流水线、传统企业 插件生态最丰富、高度可定制 Jenkinsfile(Groovy)

💡 使用 YAML 格式化 工具编辑和校验 CI/CD 配置文件,避免缩进错误。


Docker 最佳实践

Docker 是 CI/CD 流水线的基石——每一次构建都应产出确定性的、可复现的容器镜像。

多阶段构建(Multi-Stage Build)

多阶段构建是减小镜像体积的第一利器,将编译环境与运行环境分离:

# 阶段1:构建
FROM golang:1.23-alpine AS builder
WORKDIR /app
COPY go.mod go.sum ./
RUN go mod download
COPY . .
RUN CGO_ENABLED=0 GOOS=linux go build -ldflags="-s -w" -o /app/server ./cmd/server

# 阶段2:运行
FROM gcr.io/distroless/static:nonroot
COPY --from=builder /app/server /server
USER nonroot:nonroot
ENTRYPOINT ["/server"]

效果:Go 镜像从 ~300MB 压缩到 ~5MB

Node.js 多阶段构建示例:

FROM node:22-alpine AS builder
WORKDIR /app
COPY package*.json ./
RUN npm ci --only=production
COPY . .
RUN npm run build

FROM node:22-alpine AS runner
WORKDIR /app
COPY --from=builder /app/dist ./dist
COPY --from=builder /app/node_modules ./node_modules
EXPOSE 3000
CMD ["node", "dist/main.js"]

镜像层缓存优化

Docker 镜像层缓存的关键原则:变化频率低的指令放前面,变化频率高的放后面

# 好的做法:先复制依赖文件,利用缓存
COPY package*.json ./
RUN npm ci
COPY . .

# 坏的做法:先复制全部源码,每次都重新安装依赖
COPY . .
RUN npm ci

镜像体积优化清单

优化手段 效果 适用场景
多阶段构建 减少 60-90% 所有编译型语言
Alpine 基础镜像 减少 50-80% 不依赖 glibc 的应用
distroless 镜像 仅含应用二进制 Go、Java 等静态编译
.dockerignore 减少构建上下文 所有项目
合并 RUN 指令 减少镜像层数 apt/apk 安装场景
压缩二进制 -ldflags="-s -w" 减少 20-30% Go 项目
# 合并 RUN 指令减少层数
RUN apk add --no-cache curl=8.11.0 && \
    apk add --no-cache git=2.45.0 && \
    rm -rf /var/cache/apk/*

.dockerignore 最佳实践

# .dockerignore
node_modules
npm-debug.log
.git
.github
.gitlab
.vscode
.idea
*.md
*.test.js
coverage/
dist/
.env
.env.local

💡 使用 JSON 格式化 工具检查 package.json 依赖版本一致性。


Kubernetes 部署策略

Kubernetes 提供多种部署策略,选择取决于业务风险容忍度回滚速度要求

滚动更新(Rolling Update)

K8s 默认策略,逐步替换旧 Pod:

apiVersion: apps/v1
kind: Deployment
metadata:
  name: my-app
spec:
  replicas: 6
  strategy:
    type: RollingUpdate
    rollingUpdate:
      maxSurge: 2        # 最多同时多出2个Pod
      maxUnavailable: 1   # 最多允许1个Pod不可用
  selector:
    matchLabels:
      app: my-app
  template:
    metadata:
      labels:
        app: my-app
    spec:
      containers:
        - name: my-app
          image: registry.example.com/my-app:v2.0.0
          ports:
            - containerPort: 8080
          readinessProbe:
            httpGet:
              path: /healthz
              port: 8080
            initialDelaySeconds: 5
            periodSeconds: 10

蓝绿部署(Blue-Green Deployment)

同时运行两套完整环境,切换 Service 指向实现零停机:

# blue-deployment.yaml
apiVersion: apps/v1
kind: Deployment
metadata:
  name: my-app-blue
spec:
  replicas: 3
  selector:
    matchLabels:
      app: my-app
      version: blue
  template:
    metadata:
      labels:
        app: my-app
        version: blue
    spec:
      containers:
        - name: my-app
          image: registry.example.com/my-app:v1.0.0
---
# green-deployment.yaml
apiVersion: apps/v1
kind: Deployment
metadata:
  name: my-app-green
spec:
  replicas: 3
  selector:
    matchLabels:
      app: my-app
      version: green
  template:
    metadata:
      labels:
        app: my-app
        version: green
    spec:
      containers:
        - name: my-app
          image: registry.example.com/my-app:v2.0.0
---
# service.yaml(切换 selector 即可蓝绿切换)
apiVersion: v1
kind: Service
metadata:
  name: my-app-svc
spec:
  selector:
    app: my-app
    version: blue    # 改为 green 即切换到新版本
  ports:
    - port: 80
      targetPort: 8080

金丝雀发布(Canary Release)

逐步将流量切到新版本,先小比例验证再全量发布:

# canary-with-istio.yaml
apiVersion: networking.istio.io/v1beta1
kind: VirtualService
metadata:
  name: my-app-vs
spec:
  hosts:
    - my-app.example.com
  http:
    - match:
        - headers:
            x-canary:
              exact: "true"
      route:
        - destination:
            host: my-app
            subset: canary
          weight: 100
    - route:
        - destination:
            host: my-app
            subset: stable
          weight: 90
        - destination:
            host: my-app
            subset: canary
          weight: 10

部署策略对比

策略 停机时间 回滚速度 资源开销 复杂度 适用场景
滚动更新 日常发布
蓝绿部署 快(切 Service) 高(双倍) 关键业务
金丝雀 高风险变更

GitOps 与 ArgoCD

GitOps 是 2026 年 Kubernetes 部署的事实标准——用 Git 仓库作为唯一事实来源,所有变更通过 Git 提交触发。

GitOps 核心原则

  1. 声明式:所有基础设施和应用配置都是声明式的
  2. 版本化:所有配置存储在 Git 仓库,完整变更历史
  3. 自动拉取:部署工具自动从 Git 拉取变更并应用
  4. 持续协调:持续比对集群状态与 Git 声明,自动修复漂移

ArgoCD 配置示例

apiVersion: argoproj.io/v1alpha1
kind: Application
metadata:
  name: my-app
  namespace: argocd
spec:
  project: default
  source:
    repoURL: https://git.example.com/platform/k8s-manifests.git
    targetRevision: main
    path: overlays/production
  destination:
    server: https://kubernetes.default.svc
    namespace: my-app
  syncPolicy:
    automated:
      prune: true
      selfHeal: true
      allowEmpty: false
    syncOptions:
      - CreateNamespace=true
      - PrunePropagationPolicy=foreground
    retry:
      limit: 5
      backoff:
        duration: 5s
        factor: 2
        maxDuration: 3m

Kustomize 多环境管理

k8s-manifests/
├── base/
│   ├── deployment.yaml
│   ├── service.yaml
│   └── kustomization.yaml
└── overlays/
    ├── development/
    │   ├── kustomization.yaml
    │   └── patch-replicas.yaml
    ├── staging/
    │   ├── kustomization.yaml
    │   └── patch-replicas.yaml
    └── production/
        ├── kustomization.yaml
        └── patch-replicas.yaml
# overlays/production/kustomization.yaml
apiVersion: kustomize.config.k8s.io/v1beta1
kind: Kustomization
bases:
  - ../../base
patchesStrategicMerge:
  - patch-replicas.yaml
  - patch-resources.yaml
configMapGenerator:
  - name: app-config
    literals:
      - ENV=production
      - LOG_LEVEL=warn
      - DB_HOST=prod-db.internal

Pipeline as Code:完整 GitHub Actions Workflow

这是生产级 CI/CD 流水线的完整实现,涵盖构建、测试、安全扫描、部署全链路:

name: CI/CD Pipeline

on:
  push:
    branches: [main, develop]
  pull_request:
    branches: [main]

env:
  REGISTRY: ghcr.io
  IMAGE_NAME: ${{ github.repository }}
  K8S_NAMESPACE: my-app

jobs:
  # 作业1:代码检查与单元测试
  lint-and-test:
    runs-on: ubuntu-latest
    steps:
      - uses: actions/checkout@v4
      - uses: actions/setup-go@v5
        with:
          go-version: '1.23'
      - name: Lint
        run: golangci-lint run ./...
      - name: Unit Test
        run: go test -race -coverprofile=coverage.out ./...
      - name: Upload Coverage
        uses: codecov/codecov-action@v4
        with:
          file: coverage.out

  # 作业2:安全扫描
  security-scan:
    runs-on: ubuntu-latest
    needs: lint-and-test
    steps:
      - uses: actions/checkout@v4
      - name: Trivy FS Scan
        uses: aquasecurity/trivy-action@master
        with:
          scan-type: fs
          severity: CRITICAL,HIGH
          exit-code: '1'
      - name: Snyk SAST
        uses: snyk/actions/golang@master
        env:
          SNYK_TOKEN: ${{ secrets.SNYK_TOKEN }}

  # 作业3:构建并推送 Docker 镜像
  build-and-push:
    runs-on: ubuntu-latest
    needs: security-scan
    permissions:
      contents: read
      packages: write
    outputs:
      image_tag: ${{ steps.meta.outputs.tags }}
      image_digest: ${{ steps.build.outputs.digest }}
    steps:
      - uses: actions/checkout@v4
      - uses: docker/setup-buildx-action@v3
      - uses: docker/login-action@v3
        with:
          registry: ${{ env.REGISTRY }}
          username: ${{ github.actor }}
          password: ${{ secrets.GITHUB_TOKEN }}
      - name: Docker Metadata
        id: meta
        uses: docker/metadata-action@v5
        with:
          images: ${{ env.REGISTRY }}/${{ env.IMAGE_NAME }}
          tags: |
            type=sha,prefix=
            type=ref,event=branch
            type=semver,pattern={{version}}
      - name: Build and Push
        id: build
        uses: docker/build-push-action@v6
        with:
          context: .
          push: true
          tags: ${{ steps.meta.outputs.tags }}
          labels: ${{ steps.meta.outputs.labels }}
          cache-from: type=gha
          cache-to: type=gha,mode=max
          build-args: |
            BUILD_DATE=${{ github.event.head_commit.timestamp }}
            VCS_REF=${{ github.sha }}

  # 作业4:镜像安全扫描
  image-scan:
    runs-on: ubuntu-latest
    needs: build-and-push
    steps:
      - name: Trivy Image Scan
        uses: aquasecurity/trivy-action@master
        with:
          image-ref: ${{ needs.build-and-push.outputs.image_tag }}
          severity: CRITICAL,HIGH
          exit-code: '1'
          format: sarif
          output: trivy-results.sarif
      - name: Upload SARIF
        uses: github/codeql-action/upload-sarif@v3
        with:
          sarif_file: trivy-results.sarif

  # 作业5:部署到 K8s
  deploy:
    runs-on: ubuntu-latest
    needs: [build-and-push, image-scan]
    if: github.ref == 'refs/heads/main'
    environment: production
    steps:
      - uses: actions/checkout@v4
      - uses: azure/setup-kubectl@v3
      - uses: azure/setup-helm@v3
      - name: Configure kubectl
        run: |
          mkdir -p $HOME/.kube
          echo "${{ secrets.KUBE_CONFIG }}" | base64 -d > $HOME/.kube/config
      - name: Deploy with Helm
        run: |
          helm upgrade --install my-app ./helm/my-app \
            --namespace ${{ env.K8S_NAMESPACE }} \
            --set image.tag=${{ needs.build-and-push.outputs.image_tag }} \
            --set image.digest=${{ needs.build-and-push.outputs.image_digest }} \
            --values ./helm/my-app/values-production.yaml \
            --timeout 5m \
            --wait
      - name: Verify Deployment
        run: |
          kubectl rollout status deployment/my-app \
            --namespace ${{ env.K8S_NAMESPACE }} \
            --timeout=3m
      - name: Smoke Test
        run: |
          STATUS=$(curl -s -o /dev/null -w "%{http_code}" \
            https://my-app.example.com/healthz)
          if [ "$STATUS" != "200" ]; then
            echo "Smoke test failed: HTTP $STATUS"
            exit 1
          fi

GitLab CI 完整配置

# .gitlab-ci.yml
stages:
  - test
  - security
  - build
  - deploy

variables:
  DOCKER_TLS_CERTDIR: "/certs"
  REGISTRY: $CI_REGISTRY
  IMAGE_TAG: $CI_REGISTRY_IMAGE:$CI_COMMIT_SHORT_SHA

test:
  stage: test
  image: golang:1.23-alpine
  script:
    - go test -race -coverprofile=coverage.out ./...
    - go tool cover -func=coverage.out
  artifacts:
    reports:
      coverage_report:
        coverage_format: cobertura
        path: coverage.xml

security-scan:
  stage: security
  image: aquasec/trivy:latest
  script:
    - trivy fs --severity CRITICAL,HIGH --exit-code 1 .
  allow_failure: false

build:
  stage: build
  image: docker:24
  services:
    - docker:24-dind
  before_script:
    - docker login -u $CI_REGISTRY_USER -p $CI_REGISTRY_PASSWORD $CI_REGISTRY
  script:
    - docker build
        --cache-from $CI_REGISTRY_IMAGE:latest
        --tag $IMAGE_TAG
        --tag $CI_REGISTRY_IMAGE:latest
        --build-arg VCS_REF=$CI_COMMIT_SHA
        .
    - docker push $IMAGE_TAG
    - docker push $CI_REGISTRY_IMAGE:latest

deploy:staging:
  stage: deploy
  image: bitnami/kubectl:latest
  script:
    - kubectl config use-context staging
    - helm upgrade --install my-app ./helm/my-app
        --namespace staging
        --set image.tag=$CI_COMMIT_SHORT_SHA
        --values ./helm/my-app/values-staging.yaml
        --wait
  environment:
    name: staging
    url: https://staging.my-app.example.com
  only:
    - develop

deploy:production:
  stage: deploy
  image: bitnami/kubectl:latest
  script:
    - kubectl config use-context production
    - helm upgrade --install my-app ./helm/my-app
        --namespace production
        --set image.tag=$CI_COMMIT_SHORT_SHA
        --values ./helm/my-app/values-production.yaml
        --wait
  environment:
    name: production
    url: https://my-app.example.com
  when: manual
  only:
    - main

Jenkins Pipeline(Declarative)

// Jenkinsfile
pipeline {
    agent any

    environment {
        REGISTRY = 'registry.example.com'
        IMAGE_NAME = 'my-app'
        IMAGE_TAG = "${env.BUILD_NUMBER}-${env.GIT_COMMIT.take(8)}"
    }

    stages {
        stage('Test') {
            agent { label 'golang' }
            steps {
                sh 'go test -race -coverprofile=coverage.out ./...'
                sh 'golangci-lint run ./...'
            }
        }

        stage('Security Scan') {
            steps {
                sh "trivy fs --severity CRITICAL,HIGH --exit-code 1 ."
            }
        }

        stage('Build & Push') {
            agent { label 'docker' }
            steps {
                script {
                    docker.withRegistry("https://${REGISTRY}", 'registry-credentials') {
                        def image = docker.build(
                            "${IMAGE_NAME}:${IMAGE_TAG}",
                            '--build-arg VCS_REF=${GIT_COMMIT} .'
                        )
                        image.push()
                        image.push('latest')
                    }
                }
            }
        }

        stage('Deploy to Staging') {
            when { branch 'develop' }
            steps {
                sh """
                    helm upgrade --install ${IMAGE_NAME} ./helm/${IMAGE_NAME} \
                        --namespace staging \
                        --set image.tag=${IMAGE_TAG} \
                        --values ./helm/${IMAGE_NAME}/values-staging.yaml \
                        --wait
                """
            }
        }

        stage('Deploy to Production') {
            when { branch 'main' }
            input {
                message '确认部署到生产环境?'
                ok '部署'
            }
            steps {
                sh """
                    helm upgrade --install ${IMAGE_NAME} ./helm/${IMAGE_NAME} \
                        --namespace production \
                        --set image.tag=${IMAGE_TAG} \
                        --values ./helm/${IMAGE_NAME}/values-production.yaml \
                        --wait
                """
            }
        }
    }

    post {
        failure {
            slackSend(
                channel: '#cicd-alerts',
                color: 'danger',
                message: "Pipeline 失败: ${env.JOB_NAME} #${env.BUILD_NUMBER}"
            )
        }
        success {
            slackSend(
                channel: '#cicd-alerts',
                color: 'good',
                message: "部署成功: ${env.JOB_NAME} #${env.BUILD_NUMBER} → ${IMAGE_TAG}"
            )
        }
    }
}

容器镜像库管理

镜像标签策略

标签类型 示例 生命周期 用途
不可变标签 sha-abc1234 永久 生产部署引用
语义版本 v2.1.0 永久 版本发布
分支标签 main, develop 可覆盖 开发/预览
latest latest 可覆盖 仅用于本地开发

核心原则:生产环境绝不使用可变标签(如 latest),必须使用不可变标签(如 Git SHA)。

镜像清理策略

# GitHub Actions: 定期清理旧镜像
name: Registry Cleanup
on:
  schedule:
    - cron: '0 2 * * 0'  # 每周日凌晨2点

jobs:
  cleanup:
    runs-on: ubuntu-latest
    steps:
      - name: Delete untagged images
        uses: actions/delete-package-versions@v5
        with:
          package-name: my-app
          min-versions-to-keep: 10
          delete-only-untagged-versions: true

安全扫描集成

Trivy:全栈安全扫描

# 文件系统扫描(依赖漏洞)
trivy fs --severity CRITICAL,HIGH --exit-code 1 .

# 镜像扫描
trivy image --severity CRITICAL,HIGH registry.example.com/my-app:v2.0.0

# IaC 扫描(K8s manifest / Dockerfile)
trivy config --severity CRITICAL,HIGH ./k8s/

# SBOM 生成
trivy image --format spdx-json --output sbom.json registry.example.com/my-app:v2.0.0

Snyk:开发者友好的安全平台

# GitHub Actions: Snyk 集成
- name: Snyk Open Source
  uses: snyk/actions/golang@master
  env:
    SNYK_TOKEN: ${{ secrets.SNYK_TOKEN }}
  with:
    args: --severity-threshold=high

- name: Snyk Container
  uses: snyk/actions/docker@master
  env:
    SNYK_TOKEN: ${{ secrets.SNYK_TOKEN }}
  with:
    image: registry.example.com/my-app:v2.0.0
    args: --severity-threshold=high --file=Dockerfile

安全扫描层级

层级 工具 扫描内容 触发时机
SAST Snyk Code / SonarQube 源码漏洞 每次提交
SCA Snyk Open Source / Trivy fs 依赖漏洞 每次提交
容器扫描 Trivy image / Snyk Container 镜像漏洞 镜像构建后
IaC 扫描 Trivy config / Checkov K8s/Dockerfile 配置风险 PR 阶段
DAST OWASP ZAP 运行时漏洞 部署到 staging 后

💡 使用 Hash 加密 工具生成 CI/CD Secret 的校验值,确保敏感配置不被篡改。


环境管理:Dev / Staging / Prod

环境隔离策略

# Helm values 多环境配置
# values-development.yaml
replicaCount: 1
resources:
  requests:
    cpu: 100m
    memory: 128Mi
autoscaling:
  enabled: false
config:
  logLevel: debug
  dbHost: dev-db.internal

# values-staging.yaml
replicaCount: 2
resources:
  requests:
    cpu: 250m
    memory: 256Mi
autoscaling:
  enabled: true
  minReplicas: 2
  maxReplicas: 5
config:
  logLevel: info
  dbHost: staging-db.internal

# values-production.yaml
replicaCount: 3
resources:
  requests:
    cpu: 500m
    memory: 512Mi
  limits:
    cpu: 1000m
    memory: 1Gi
autoscaling:
  enabled: true
  minReplicas: 3
  maxReplicas: 20
  targetCPUUtilizationPercentage: 70
config:
  logLevel: warn
  dbHost: prod-db.internal

GitHub Actions Environment 保护规则

# 生产环境需要人工审批
deploy-production:
  runs-on: ubuntu-latest
  environment: production    # 需在 GitHub Settings 中配置审批人
  steps:
    - name: Deploy
      run: helm upgrade --install my-app ./helm/my-app

在 GitHub 仓库 Settings → Environments 中配置:

  • production:Required reviewers = 2 人审批,Wait timer = 5 分钟
  • staging:无需审批,自动部署

监控与告警集成

Prometheus + Grafana 指标采集

# K8s Pod Monitor 注解
apiVersion: apps/v1
kind: Deployment
metadata:
  name: my-app
  annotations:
    prometheus.io/scrape: "true"
    prometheus.io/port: "8080"
    prometheus.io/path: "/metrics"
spec:
  template:
    spec:
      containers:
        - name: my-app
          image: registry.example.com/my-app:v2.0.0
          ports:
            - containerPort: 8080

部署告警规则

apiVersion: monitoring.coreos.com/v1
kind: PrometheusRule
metadata:
  name: deployment-alerts
  namespace: monitoring
spec:
  groups:
    - name: deployment
      rules:
        - alert: DeploymentRolloutStuck
          expr: |
            kube_deployment_status_replicas_unavailable / kube_deployment_status_replicas > 0.5
          for: 5m
          labels:
            severity: warning
          annotations:
            summary: "Deployment {{ $labels.deployment }} 滚动更新卡住"
        - alert: HighErrorRateAfterDeploy
          expr: |
            rate(http_requests_total{status=~"5.."}[5m])
            /
            rate(http_requests_total[5m]) > 0.05
          for: 3m
          labels:
            severity: critical
          annotations:
            summary: "部署后 5xx 错误率超过 5%"

Slack/钉钉 告警通知

# GitHub Actions: 部署通知
- name: Notify Deployment
  if: always()
  uses: 8398a7/action-slack@v3
  with:
    status: ${{ job.status }}
    fields: repo,message,commit,author,action,eventName,ref,workflow
  env:
    SLACK_WEBHOOK_URL: ${{ secrets.SLACK_WEBHOOK }}

回滚策略

自动回滚:健康检查失败时

# Helm 部署 + 自动回滚
- name: Deploy with Auto Rollback
  run: |
    helm upgrade --install my-app ./helm/my-app \
      --namespace production \
      --set image.tag=${{ steps.meta.outputs.tags }} \
      --values ./helm/my-app/values-production.yaml \
      --timeout 5m \
      --wait || \
    (echo "部署失败,执行回滚..." && \
     helm rollback my-app --namespace production && \
     exit 1)

手动回滚:基于 Git SHA

# 回滚到指定版本
kubectl rollout undo deployment/my-app --to-revision=3

# 回滚 Helm 部署
helm rollback my-app 2 --namespace production

# 基于 GitOps 的回滚:回退 Git 提交
git revert <commit-hash>
git push origin main
# ArgoCD 自动检测到变更并执行回滚

金丝雀自动回滚

# Argo Rollouts: 自动分析 + 回滚
apiVersion: argoproj.io/v1alpha1
kind: Rollout
metadata:
  name: my-app
spec:
  strategy:
    canary:
      canaryAnalysis:
        templates:
          - templateName: success-rate
            clusterScope: true
        startingStep: 2
        steps:
          - setWeight: 10
          - pause: { duration: 5m }
          - setWeight: 30
          - pause: { duration: 5m }
          - setWeight: 60
          - pause: { duration: 5m }
          - setWeight: 100
        analysisRun:
          successfulRunHistoryLimit: 3
          unsuccessfulRunHistoryLimit: 3

常见流水线故障与修复

故障现象 根因 修复方案
Docker 构建缓存失效 .dockerignore 缺失或 COPY 顺序错误 优化 Dockerfile 指令顺序,添加 .dockerignore
镜像推送 403 Registry 认证过期或权限不足 检查 Service Account / Token 权限
K8s ImagePullBackOff 镜像标签不存在或 Registry 不可达 验证镜像标签,检查 Registry 网络和 Secret
Helm 部署超时 readinessProbe 配置错误或资源不足 调整 probe 参数,增加 resources limits
测试环境与生产不一致 环境配置差异 使用 Kustomize/Helm 统一管理,减少硬编码
安全扫描误报 依赖间接引入的漏洞 配置 .trivyignore 或 Snyk policy 忽略已知误报
并发部署冲突 多人同时触发流水线 使用 GitHub Concurrency 或 GitLab resource_group
Secret 泄露 明文写入 YAML 或日志 使用 Sealed Secrets / External Secrets Operator

并发控制

# GitHub Actions: 防止并发部署冲突
concurrency:
  group: deploy-${{ github.ref }}
  cancel-in-progress: true

调试技巧

# 查看 Pod 事件
kubectl describe pod <pod-name> -n <namespace>

# 查看部署历史
kubectl rollout history deployment/my-app -n production

# 查看 Helm 发布历史
helm history my-app -n production

# 端口转发调试
kubectl port-forward svc/my-app 8080:80 -n staging

# 查看容器日志
kubectl logs -f deployment/my-app -n production --all-containers

FAQ

Q:GitHub Actions 免费额度够用吗? A:公开仓库无限,私有仓库每月 2000 分钟(Linux)。自托管 Runner 无限制。

Q:Docker 镜像应该用 latest 标签吗? A:生产环境绝不使用 latest。使用 Git SHA 或语义版本作为不可变标签,确保部署可追溯和可回滚。

Q:蓝绿部署和金丝雀发布怎么选? A:蓝绿适合需要快速回滚的关键业务(切换 Service 即可),金丝雀适合需要渐进验证的高风险变更。日常发布用滚动更新即可。

Q:GitOps 和传统 CI/CD Push 模式有什么区别? A:传统 Push 模式是 CI 流水线主动 kubectl apply,GitOps 是集群内 Agent(ArgoCD)主动拉取 Git 变更。GitOps 的优势:Git 是唯一事实来源,集群状态漂移可自动修复。

Q:如何处理 CI/CD 中的 Secret? A:使用平台原生 Secret 管理(GitHub Secrets / GitLab Variables / Jenkins Credentials),K8s 中使用 Sealed Secrets 或 External Secrets Operator,绝不将 Secret 提交到 Git

Q:多集群部署如何管理? A:使用 ArgoCD ApplicationSet + Git 目录结构,或 Helm + kubeconfig 多上下文切换。推荐 ArgoCD 方案,天然支持多集群。

Q:流水线太慢怎么优化? A:1)利用 Docker 层缓存和 GitHub Actions 缓存;2)并行执行独立 Job;3)使用自托管 Runner 减少冷启动;4)增量测试(只测试变更模块)。

本站提供浏览器本地工具,免注册即可试用 →

#DevOps#CI/CD#Docker#Kubernetes#教程