DevOps CI/CD 流水线实战:Docker + Kubernetes 全链路
CI/CD 基础与 2026 技术格局
CI/CD(持续集成/持续部署)是 DevOps 的核心实践,目标是将代码从提交到生产环境的整个过程自动化、可追溯、可回滚。
核心概念
| 概念 | 全称 | 核心目标 |
|---|---|---|
| CI | Continuous Integration | 频繁合并代码,自动构建+测试,尽早发现问题 |
| CD(交付) | Continuous Delivery | 代码随时可部署到生产,需人工审批 |
| CD(部署) | Continuous Deployment | 代码通过测试后自动部署到生产,无需人工干预 |
2026 主流 CI/CD 平台对比
| 平台 | 适用场景 | 核心优势 | Pipeline 定义 |
|---|---|---|---|
| GitHub Actions | 开源项目、中小团队 | 原生集成、Marketplace 生态、免费额度大 | .github/workflows/*.yml |
| GitLab CI | 企业私有化、自托管 | 内置容器镜像库、安全扫描、K8s 集成 | .gitlab-ci.yml |
| Jenkins | 复杂流水线、传统企业 | 插件生态最丰富、高度可定制 | Jenkinsfile(Groovy) |
💡 使用 YAML 格式化 工具编辑和校验 CI/CD 配置文件,避免缩进错误。
Docker 最佳实践
Docker 是 CI/CD 流水线的基石——每一次构建都应产出确定性的、可复现的容器镜像。
多阶段构建(Multi-Stage Build)
多阶段构建是减小镜像体积的第一利器,将编译环境与运行环境分离:
# 阶段1:构建
FROM golang:1.23-alpine AS builder
WORKDIR /app
COPY go.mod go.sum ./
RUN go mod download
COPY . .
RUN CGO_ENABLED=0 GOOS=linux go build -ldflags="-s -w" -o /app/server ./cmd/server
# 阶段2:运行
FROM gcr.io/distroless/static:nonroot
COPY --from=builder /app/server /server
USER nonroot:nonroot
ENTRYPOINT ["/server"]
效果:Go 镜像从 ~300MB 压缩到 ~5MB。
Node.js 多阶段构建示例:
FROM node:22-alpine AS builder
WORKDIR /app
COPY package*.json ./
RUN npm ci --only=production
COPY . .
RUN npm run build
FROM node:22-alpine AS runner
WORKDIR /app
COPY --from=builder /app/dist ./dist
COPY --from=builder /app/node_modules ./node_modules
EXPOSE 3000
CMD ["node", "dist/main.js"]
镜像层缓存优化
Docker 镜像层缓存的关键原则:变化频率低的指令放前面,变化频率高的放后面。
# 好的做法:先复制依赖文件,利用缓存
COPY package*.json ./
RUN npm ci
COPY . .
# 坏的做法:先复制全部源码,每次都重新安装依赖
COPY . .
RUN npm ci
镜像体积优化清单
| 优化手段 | 效果 | 适用场景 |
|---|---|---|
| 多阶段构建 | 减少 60-90% | 所有编译型语言 |
| Alpine 基础镜像 | 减少 50-80% | 不依赖 glibc 的应用 |
| distroless 镜像 | 仅含应用二进制 | Go、Java 等静态编译 |
.dockerignore |
减少构建上下文 | 所有项目 |
| 合并 RUN 指令 | 减少镜像层数 | apt/apk 安装场景 |
压缩二进制 -ldflags="-s -w" |
减少 20-30% | Go 项目 |
# 合并 RUN 指令减少层数
RUN apk add --no-cache curl=8.11.0 && \
apk add --no-cache git=2.45.0 && \
rm -rf /var/cache/apk/*
.dockerignore 最佳实践
# .dockerignore
node_modules
npm-debug.log
.git
.github
.gitlab
.vscode
.idea
*.md
*.test.js
coverage/
dist/
.env
.env.local
💡 使用 JSON 格式化 工具检查
package.json依赖版本一致性。
Kubernetes 部署策略
Kubernetes 提供多种部署策略,选择取决于业务风险容忍度和回滚速度要求。
滚动更新(Rolling Update)
K8s 默认策略,逐步替换旧 Pod:
apiVersion: apps/v1
kind: Deployment
metadata:
name: my-app
spec:
replicas: 6
strategy:
type: RollingUpdate
rollingUpdate:
maxSurge: 2 # 最多同时多出2个Pod
maxUnavailable: 1 # 最多允许1个Pod不可用
selector:
matchLabels:
app: my-app
template:
metadata:
labels:
app: my-app
spec:
containers:
- name: my-app
image: registry.example.com/my-app:v2.0.0
ports:
- containerPort: 8080
readinessProbe:
httpGet:
path: /healthz
port: 8080
initialDelaySeconds: 5
periodSeconds: 10
蓝绿部署(Blue-Green Deployment)
同时运行两套完整环境,切换 Service 指向实现零停机:
# blue-deployment.yaml
apiVersion: apps/v1
kind: Deployment
metadata:
name: my-app-blue
spec:
replicas: 3
selector:
matchLabels:
app: my-app
version: blue
template:
metadata:
labels:
app: my-app
version: blue
spec:
containers:
- name: my-app
image: registry.example.com/my-app:v1.0.0
---
# green-deployment.yaml
apiVersion: apps/v1
kind: Deployment
metadata:
name: my-app-green
spec:
replicas: 3
selector:
matchLabels:
app: my-app
version: green
template:
metadata:
labels:
app: my-app
version: green
spec:
containers:
- name: my-app
image: registry.example.com/my-app:v2.0.0
---
# service.yaml(切换 selector 即可蓝绿切换)
apiVersion: v1
kind: Service
metadata:
name: my-app-svc
spec:
selector:
app: my-app
version: blue # 改为 green 即切换到新版本
ports:
- port: 80
targetPort: 8080
金丝雀发布(Canary Release)
逐步将流量切到新版本,先小比例验证再全量发布:
# canary-with-istio.yaml
apiVersion: networking.istio.io/v1beta1
kind: VirtualService
metadata:
name: my-app-vs
spec:
hosts:
- my-app.example.com
http:
- match:
- headers:
x-canary:
exact: "true"
route:
- destination:
host: my-app
subset: canary
weight: 100
- route:
- destination:
host: my-app
subset: stable
weight: 90
- destination:
host: my-app
subset: canary
weight: 10
部署策略对比
| 策略 | 停机时间 | 回滚速度 | 资源开销 | 复杂度 | 适用场景 |
|---|---|---|---|---|---|
| 滚动更新 | 低 | 中 | 低 | 低 | 日常发布 |
| 蓝绿部署 | 零 | 快(切 Service) | 高(双倍) | 中 | 关键业务 |
| 金丝雀 | 零 | 快 | 中 | 高 | 高风险变更 |
GitOps 与 ArgoCD
GitOps 是 2026 年 Kubernetes 部署的事实标准——用 Git 仓库作为唯一事实来源,所有变更通过 Git 提交触发。
GitOps 核心原则
- 声明式:所有基础设施和应用配置都是声明式的
- 版本化:所有配置存储在 Git 仓库,完整变更历史
- 自动拉取:部署工具自动从 Git 拉取变更并应用
- 持续协调:持续比对集群状态与 Git 声明,自动修复漂移
ArgoCD 配置示例
apiVersion: argoproj.io/v1alpha1
kind: Application
metadata:
name: my-app
namespace: argocd
spec:
project: default
source:
repoURL: https://git.example.com/platform/k8s-manifests.git
targetRevision: main
path: overlays/production
destination:
server: https://kubernetes.default.svc
namespace: my-app
syncPolicy:
automated:
prune: true
selfHeal: true
allowEmpty: false
syncOptions:
- CreateNamespace=true
- PrunePropagationPolicy=foreground
retry:
limit: 5
backoff:
duration: 5s
factor: 2
maxDuration: 3m
Kustomize 多环境管理
k8s-manifests/
├── base/
│ ├── deployment.yaml
│ ├── service.yaml
│ └── kustomization.yaml
└── overlays/
├── development/
│ ├── kustomization.yaml
│ └── patch-replicas.yaml
├── staging/
│ ├── kustomization.yaml
│ └── patch-replicas.yaml
└── production/
├── kustomization.yaml
└── patch-replicas.yaml
# overlays/production/kustomization.yaml
apiVersion: kustomize.config.k8s.io/v1beta1
kind: Kustomization
bases:
- ../../base
patchesStrategicMerge:
- patch-replicas.yaml
- patch-resources.yaml
configMapGenerator:
- name: app-config
literals:
- ENV=production
- LOG_LEVEL=warn
- DB_HOST=prod-db.internal
Pipeline as Code:完整 GitHub Actions Workflow
这是生产级 CI/CD 流水线的完整实现,涵盖构建、测试、安全扫描、部署全链路:
name: CI/CD Pipeline
on:
push:
branches: [main, develop]
pull_request:
branches: [main]
env:
REGISTRY: ghcr.io
IMAGE_NAME: ${{ github.repository }}
K8S_NAMESPACE: my-app
jobs:
# 作业1:代码检查与单元测试
lint-and-test:
runs-on: ubuntu-latest
steps:
- uses: actions/checkout@v4
- uses: actions/setup-go@v5
with:
go-version: '1.23'
- name: Lint
run: golangci-lint run ./...
- name: Unit Test
run: go test -race -coverprofile=coverage.out ./...
- name: Upload Coverage
uses: codecov/codecov-action@v4
with:
file: coverage.out
# 作业2:安全扫描
security-scan:
runs-on: ubuntu-latest
needs: lint-and-test
steps:
- uses: actions/checkout@v4
- name: Trivy FS Scan
uses: aquasecurity/trivy-action@master
with:
scan-type: fs
severity: CRITICAL,HIGH
exit-code: '1'
- name: Snyk SAST
uses: snyk/actions/golang@master
env:
SNYK_TOKEN: ${{ secrets.SNYK_TOKEN }}
# 作业3:构建并推送 Docker 镜像
build-and-push:
runs-on: ubuntu-latest
needs: security-scan
permissions:
contents: read
packages: write
outputs:
image_tag: ${{ steps.meta.outputs.tags }}
image_digest: ${{ steps.build.outputs.digest }}
steps:
- uses: actions/checkout@v4
- uses: docker/setup-buildx-action@v3
- uses: docker/login-action@v3
with:
registry: ${{ env.REGISTRY }}
username: ${{ github.actor }}
password: ${{ secrets.GITHUB_TOKEN }}
- name: Docker Metadata
id: meta
uses: docker/metadata-action@v5
with:
images: ${{ env.REGISTRY }}/${{ env.IMAGE_NAME }}
tags: |
type=sha,prefix=
type=ref,event=branch
type=semver,pattern={{version}}
- name: Build and Push
id: build
uses: docker/build-push-action@v6
with:
context: .
push: true
tags: ${{ steps.meta.outputs.tags }}
labels: ${{ steps.meta.outputs.labels }}
cache-from: type=gha
cache-to: type=gha,mode=max
build-args: |
BUILD_DATE=${{ github.event.head_commit.timestamp }}
VCS_REF=${{ github.sha }}
# 作业4:镜像安全扫描
image-scan:
runs-on: ubuntu-latest
needs: build-and-push
steps:
- name: Trivy Image Scan
uses: aquasecurity/trivy-action@master
with:
image-ref: ${{ needs.build-and-push.outputs.image_tag }}
severity: CRITICAL,HIGH
exit-code: '1'
format: sarif
output: trivy-results.sarif
- name: Upload SARIF
uses: github/codeql-action/upload-sarif@v3
with:
sarif_file: trivy-results.sarif
# 作业5:部署到 K8s
deploy:
runs-on: ubuntu-latest
needs: [build-and-push, image-scan]
if: github.ref == 'refs/heads/main'
environment: production
steps:
- uses: actions/checkout@v4
- uses: azure/setup-kubectl@v3
- uses: azure/setup-helm@v3
- name: Configure kubectl
run: |
mkdir -p $HOME/.kube
echo "${{ secrets.KUBE_CONFIG }}" | base64 -d > $HOME/.kube/config
- name: Deploy with Helm
run: |
helm upgrade --install my-app ./helm/my-app \
--namespace ${{ env.K8S_NAMESPACE }} \
--set image.tag=${{ needs.build-and-push.outputs.image_tag }} \
--set image.digest=${{ needs.build-and-push.outputs.image_digest }} \
--values ./helm/my-app/values-production.yaml \
--timeout 5m \
--wait
- name: Verify Deployment
run: |
kubectl rollout status deployment/my-app \
--namespace ${{ env.K8S_NAMESPACE }} \
--timeout=3m
- name: Smoke Test
run: |
STATUS=$(curl -s -o /dev/null -w "%{http_code}" \
https://my-app.example.com/healthz)
if [ "$STATUS" != "200" ]; then
echo "Smoke test failed: HTTP $STATUS"
exit 1
fi
GitLab CI 完整配置
# .gitlab-ci.yml
stages:
- test
- security
- build
- deploy
variables:
DOCKER_TLS_CERTDIR: "/certs"
REGISTRY: $CI_REGISTRY
IMAGE_TAG: $CI_REGISTRY_IMAGE:$CI_COMMIT_SHORT_SHA
test:
stage: test
image: golang:1.23-alpine
script:
- go test -race -coverprofile=coverage.out ./...
- go tool cover -func=coverage.out
artifacts:
reports:
coverage_report:
coverage_format: cobertura
path: coverage.xml
security-scan:
stage: security
image: aquasec/trivy:latest
script:
- trivy fs --severity CRITICAL,HIGH --exit-code 1 .
allow_failure: false
build:
stage: build
image: docker:24
services:
- docker:24-dind
before_script:
- docker login -u $CI_REGISTRY_USER -p $CI_REGISTRY_PASSWORD $CI_REGISTRY
script:
- docker build
--cache-from $CI_REGISTRY_IMAGE:latest
--tag $IMAGE_TAG
--tag $CI_REGISTRY_IMAGE:latest
--build-arg VCS_REF=$CI_COMMIT_SHA
.
- docker push $IMAGE_TAG
- docker push $CI_REGISTRY_IMAGE:latest
deploy:staging:
stage: deploy
image: bitnami/kubectl:latest
script:
- kubectl config use-context staging
- helm upgrade --install my-app ./helm/my-app
--namespace staging
--set image.tag=$CI_COMMIT_SHORT_SHA
--values ./helm/my-app/values-staging.yaml
--wait
environment:
name: staging
url: https://staging.my-app.example.com
only:
- develop
deploy:production:
stage: deploy
image: bitnami/kubectl:latest
script:
- kubectl config use-context production
- helm upgrade --install my-app ./helm/my-app
--namespace production
--set image.tag=$CI_COMMIT_SHORT_SHA
--values ./helm/my-app/values-production.yaml
--wait
environment:
name: production
url: https://my-app.example.com
when: manual
only:
- main
Jenkins Pipeline(Declarative)
// Jenkinsfile
pipeline {
agent any
environment {
REGISTRY = 'registry.example.com'
IMAGE_NAME = 'my-app'
IMAGE_TAG = "${env.BUILD_NUMBER}-${env.GIT_COMMIT.take(8)}"
}
stages {
stage('Test') {
agent { label 'golang' }
steps {
sh 'go test -race -coverprofile=coverage.out ./...'
sh 'golangci-lint run ./...'
}
}
stage('Security Scan') {
steps {
sh "trivy fs --severity CRITICAL,HIGH --exit-code 1 ."
}
}
stage('Build & Push') {
agent { label 'docker' }
steps {
script {
docker.withRegistry("https://${REGISTRY}", 'registry-credentials') {
def image = docker.build(
"${IMAGE_NAME}:${IMAGE_TAG}",
'--build-arg VCS_REF=${GIT_COMMIT} .'
)
image.push()
image.push('latest')
}
}
}
}
stage('Deploy to Staging') {
when { branch 'develop' }
steps {
sh """
helm upgrade --install ${IMAGE_NAME} ./helm/${IMAGE_NAME} \
--namespace staging \
--set image.tag=${IMAGE_TAG} \
--values ./helm/${IMAGE_NAME}/values-staging.yaml \
--wait
"""
}
}
stage('Deploy to Production') {
when { branch 'main' }
input {
message '确认部署到生产环境?'
ok '部署'
}
steps {
sh """
helm upgrade --install ${IMAGE_NAME} ./helm/${IMAGE_NAME} \
--namespace production \
--set image.tag=${IMAGE_TAG} \
--values ./helm/${IMAGE_NAME}/values-production.yaml \
--wait
"""
}
}
}
post {
failure {
slackSend(
channel: '#cicd-alerts',
color: 'danger',
message: "Pipeline 失败: ${env.JOB_NAME} #${env.BUILD_NUMBER}"
)
}
success {
slackSend(
channel: '#cicd-alerts',
color: 'good',
message: "部署成功: ${env.JOB_NAME} #${env.BUILD_NUMBER} → ${IMAGE_TAG}"
)
}
}
}
容器镜像库管理
镜像标签策略
| 标签类型 | 示例 | 生命周期 | 用途 |
|---|---|---|---|
| 不可变标签 | sha-abc1234 |
永久 | 生产部署引用 |
| 语义版本 | v2.1.0 |
永久 | 版本发布 |
| 分支标签 | main, develop |
可覆盖 | 开发/预览 |
latest |
latest |
可覆盖 | 仅用于本地开发 |
核心原则:生产环境绝不使用可变标签(如 latest),必须使用不可变标签(如 Git SHA)。
镜像清理策略
# GitHub Actions: 定期清理旧镜像
name: Registry Cleanup
on:
schedule:
- cron: '0 2 * * 0' # 每周日凌晨2点
jobs:
cleanup:
runs-on: ubuntu-latest
steps:
- name: Delete untagged images
uses: actions/delete-package-versions@v5
with:
package-name: my-app
min-versions-to-keep: 10
delete-only-untagged-versions: true
安全扫描集成
Trivy:全栈安全扫描
# 文件系统扫描(依赖漏洞)
trivy fs --severity CRITICAL,HIGH --exit-code 1 .
# 镜像扫描
trivy image --severity CRITICAL,HIGH registry.example.com/my-app:v2.0.0
# IaC 扫描(K8s manifest / Dockerfile)
trivy config --severity CRITICAL,HIGH ./k8s/
# SBOM 生成
trivy image --format spdx-json --output sbom.json registry.example.com/my-app:v2.0.0
Snyk:开发者友好的安全平台
# GitHub Actions: Snyk 集成
- name: Snyk Open Source
uses: snyk/actions/golang@master
env:
SNYK_TOKEN: ${{ secrets.SNYK_TOKEN }}
with:
args: --severity-threshold=high
- name: Snyk Container
uses: snyk/actions/docker@master
env:
SNYK_TOKEN: ${{ secrets.SNYK_TOKEN }}
with:
image: registry.example.com/my-app:v2.0.0
args: --severity-threshold=high --file=Dockerfile
安全扫描层级
| 层级 | 工具 | 扫描内容 | 触发时机 |
|---|---|---|---|
| SAST | Snyk Code / SonarQube | 源码漏洞 | 每次提交 |
| SCA | Snyk Open Source / Trivy fs | 依赖漏洞 | 每次提交 |
| 容器扫描 | Trivy image / Snyk Container | 镜像漏洞 | 镜像构建后 |
| IaC 扫描 | Trivy config / Checkov | K8s/Dockerfile 配置风险 | PR 阶段 |
| DAST | OWASP ZAP | 运行时漏洞 | 部署到 staging 后 |
💡 使用 Hash 加密 工具生成 CI/CD Secret 的校验值,确保敏感配置不被篡改。
环境管理:Dev / Staging / Prod
环境隔离策略
# Helm values 多环境配置
# values-development.yaml
replicaCount: 1
resources:
requests:
cpu: 100m
memory: 128Mi
autoscaling:
enabled: false
config:
logLevel: debug
dbHost: dev-db.internal
# values-staging.yaml
replicaCount: 2
resources:
requests:
cpu: 250m
memory: 256Mi
autoscaling:
enabled: true
minReplicas: 2
maxReplicas: 5
config:
logLevel: info
dbHost: staging-db.internal
# values-production.yaml
replicaCount: 3
resources:
requests:
cpu: 500m
memory: 512Mi
limits:
cpu: 1000m
memory: 1Gi
autoscaling:
enabled: true
minReplicas: 3
maxReplicas: 20
targetCPUUtilizationPercentage: 70
config:
logLevel: warn
dbHost: prod-db.internal
GitHub Actions Environment 保护规则
# 生产环境需要人工审批
deploy-production:
runs-on: ubuntu-latest
environment: production # 需在 GitHub Settings 中配置审批人
steps:
- name: Deploy
run: helm upgrade --install my-app ./helm/my-app
在 GitHub 仓库 Settings → Environments 中配置:
- production:Required reviewers = 2 人审批,Wait timer = 5 分钟
- staging:无需审批,自动部署
监控与告警集成
Prometheus + Grafana 指标采集
# K8s Pod Monitor 注解
apiVersion: apps/v1
kind: Deployment
metadata:
name: my-app
annotations:
prometheus.io/scrape: "true"
prometheus.io/port: "8080"
prometheus.io/path: "/metrics"
spec:
template:
spec:
containers:
- name: my-app
image: registry.example.com/my-app:v2.0.0
ports:
- containerPort: 8080
部署告警规则
apiVersion: monitoring.coreos.com/v1
kind: PrometheusRule
metadata:
name: deployment-alerts
namespace: monitoring
spec:
groups:
- name: deployment
rules:
- alert: DeploymentRolloutStuck
expr: |
kube_deployment_status_replicas_unavailable / kube_deployment_status_replicas > 0.5
for: 5m
labels:
severity: warning
annotations:
summary: "Deployment {{ $labels.deployment }} 滚动更新卡住"
- alert: HighErrorRateAfterDeploy
expr: |
rate(http_requests_total{status=~"5.."}[5m])
/
rate(http_requests_total[5m]) > 0.05
for: 3m
labels:
severity: critical
annotations:
summary: "部署后 5xx 错误率超过 5%"
Slack/钉钉 告警通知
# GitHub Actions: 部署通知
- name: Notify Deployment
if: always()
uses: 8398a7/action-slack@v3
with:
status: ${{ job.status }}
fields: repo,message,commit,author,action,eventName,ref,workflow
env:
SLACK_WEBHOOK_URL: ${{ secrets.SLACK_WEBHOOK }}
回滚策略
自动回滚:健康检查失败时
# Helm 部署 + 自动回滚
- name: Deploy with Auto Rollback
run: |
helm upgrade --install my-app ./helm/my-app \
--namespace production \
--set image.tag=${{ steps.meta.outputs.tags }} \
--values ./helm/my-app/values-production.yaml \
--timeout 5m \
--wait || \
(echo "部署失败,执行回滚..." && \
helm rollback my-app --namespace production && \
exit 1)
手动回滚:基于 Git SHA
# 回滚到指定版本
kubectl rollout undo deployment/my-app --to-revision=3
# 回滚 Helm 部署
helm rollback my-app 2 --namespace production
# 基于 GitOps 的回滚:回退 Git 提交
git revert <commit-hash>
git push origin main
# ArgoCD 自动检测到变更并执行回滚
金丝雀自动回滚
# Argo Rollouts: 自动分析 + 回滚
apiVersion: argoproj.io/v1alpha1
kind: Rollout
metadata:
name: my-app
spec:
strategy:
canary:
canaryAnalysis:
templates:
- templateName: success-rate
clusterScope: true
startingStep: 2
steps:
- setWeight: 10
- pause: { duration: 5m }
- setWeight: 30
- pause: { duration: 5m }
- setWeight: 60
- pause: { duration: 5m }
- setWeight: 100
analysisRun:
successfulRunHistoryLimit: 3
unsuccessfulRunHistoryLimit: 3
常见流水线故障与修复
| 故障现象 | 根因 | 修复方案 |
|---|---|---|
| Docker 构建缓存失效 | .dockerignore 缺失或 COPY 顺序错误 |
优化 Dockerfile 指令顺序,添加 .dockerignore |
| 镜像推送 403 | Registry 认证过期或权限不足 | 检查 Service Account / Token 权限 |
| K8s ImagePullBackOff | 镜像标签不存在或 Registry 不可达 | 验证镜像标签,检查 Registry 网络和 Secret |
| Helm 部署超时 | readinessProbe 配置错误或资源不足 | 调整 probe 参数,增加 resources limits |
| 测试环境与生产不一致 | 环境配置差异 | 使用 Kustomize/Helm 统一管理,减少硬编码 |
| 安全扫描误报 | 依赖间接引入的漏洞 | 配置 .trivyignore 或 Snyk policy 忽略已知误报 |
| 并发部署冲突 | 多人同时触发流水线 | 使用 GitHub Concurrency 或 GitLab resource_group |
| Secret 泄露 | 明文写入 YAML 或日志 | 使用 Sealed Secrets / External Secrets Operator |
并发控制
# GitHub Actions: 防止并发部署冲突
concurrency:
group: deploy-${{ github.ref }}
cancel-in-progress: true
调试技巧
# 查看 Pod 事件
kubectl describe pod <pod-name> -n <namespace>
# 查看部署历史
kubectl rollout history deployment/my-app -n production
# 查看 Helm 发布历史
helm history my-app -n production
# 端口转发调试
kubectl port-forward svc/my-app 8080:80 -n staging
# 查看容器日志
kubectl logs -f deployment/my-app -n production --all-containers
FAQ
Q:GitHub Actions 免费额度够用吗? A:公开仓库无限,私有仓库每月 2000 分钟(Linux)。自托管 Runner 无限制。
Q:Docker 镜像应该用 latest 标签吗? A:生产环境绝不使用 latest。使用 Git SHA 或语义版本作为不可变标签,确保部署可追溯和可回滚。
Q:蓝绿部署和金丝雀发布怎么选? A:蓝绿适合需要快速回滚的关键业务(切换 Service 即可),金丝雀适合需要渐进验证的高风险变更。日常发布用滚动更新即可。
Q:GitOps 和传统 CI/CD Push 模式有什么区别?
A:传统 Push 模式是 CI 流水线主动 kubectl apply,GitOps 是集群内 Agent(ArgoCD)主动拉取 Git 变更。GitOps 的优势:Git 是唯一事实来源,集群状态漂移可自动修复。
Q:如何处理 CI/CD 中的 Secret? A:使用平台原生 Secret 管理(GitHub Secrets / GitLab Variables / Jenkins Credentials),K8s 中使用 Sealed Secrets 或 External Secrets Operator,绝不将 Secret 提交到 Git。
Q:多集群部署如何管理? A:使用 ArgoCD ApplicationSet + Git 目录结构,或 Helm + kubeconfig 多上下文切换。推荐 ArgoCD 方案,天然支持多集群。
Q:流水线太慢怎么优化? A:1)利用 Docker 层缓存和 GitHub Actions 缓存;2)并行执行独立 Job;3)使用自托管 Runner 减少冷启动;4)增量测试(只测试变更模块)。
本站提供浏览器本地工具,免注册即可试用 →