DevOps CI/CD Pipeline in Practice: Docker + Kubernetes Full Chain

DevOps

CI/CD Fundamentals & 2026 Landscape

CI/CD (Continuous Integration/Continuous Deployment) is the core DevOps practice, aiming to make the entire process from code commit to production automated, traceable, and rollbackable.

Core Concepts

Concept Full Name Core Goal
CI Continuous Integration Merge code frequently, auto build+test, catch issues early
CD (Delivery) Continuous Delivery Code is always ready to deploy, requires manual approval
CD (Deployment) Continuous Deployment Code auto-deploys to production after passing tests, no manual gate

2026 Mainstream CI/CD Platform Comparison

Platform Use Case Core Strength Pipeline Definition
GitHub Actions Open source, small-medium teams Native integration, Marketplace ecosystem, generous free tier .github/workflows/*.yml
GitLab CI Enterprise self-hosted Built-in container registry, security scanning, K8s integration .gitlab-ci.yml
Jenkins Complex pipelines, traditional enterprise Richest plugin ecosystem, highly customizable Jenkinsfile (Groovy)

💡 Use the YAML Formatter tool to edit and validate CI/CD configuration files, avoiding indentation errors.


Docker Best Practices

Docker is the cornerstone of CI/CD pipelines—every build should produce a deterministic, reproducible container image.

Multi-Stage Build

Multi-stage builds are the top technique for reducing image size, separating build environment from runtime:

# Stage 1: Build
FROM golang:1.23-alpine AS builder
WORKDIR /app
COPY go.mod go.sum ./
RUN go mod download
COPY . .
RUN CGO_ENABLED=0 GOOS=linux go build -ldflags="-s -w" -o /app/server ./cmd/server

# Stage 2: Runtime
FROM gcr.io/distroless/static:nonroot
COPY --from=builder /app/server /server
USER nonroot:nonroot
ENTRYPOINT ["/server"]

Result: Go image reduced from ~300MB to ~5MB.

Node.js multi-stage build example:

FROM node:22-alpine AS builder
WORKDIR /app
COPY package*.json ./
RUN npm ci --only=production
COPY . .
RUN npm run build

FROM node:22-alpine AS runner
WORKDIR /app
COPY --from=builder /app/dist ./dist
COPY --from=builder /app/node_modules ./node_modules
EXPOSE 3000
CMD ["node", "dist/main.js"]

Image Layer Caching Optimization

The key principle of Docker layer caching: instructions that change less frequently go first, those that change more go last.

# Good: copy dependency files first, leverage cache
COPY package*.json ./
RUN npm ci
COPY . .

# Bad: copy all source first, reinstall deps every time
COPY . .
RUN npm ci

Image Size Optimization Checklist

Technique Effect Use Case
Multi-stage build Reduce 60-90% All compiled languages
Alpine base image Reduce 50-80% Apps not dependent on glibc
distroless image App binary only Go, Java static compilation
.dockerignore Reduce build context All projects
Merge RUN instructions Reduce image layers apt/apk install scenarios
Strip binary -ldflags="-s -w" Reduce 20-30% Go projects
# Merge RUN instructions to reduce layers
RUN apk add --no-cache curl=8.11.0 && \
    apk add --no-cache git=2.45.0 && \
    rm -rf /var/cache/apk/*

.dockerignore Best Practices

# .dockerignore
node_modules
npm-debug.log
.git
.github
.gitlab
.vscode
.idea
*.md
*.test.js
coverage/
dist/
.env
.env.local

💡 Use the JSON Formatter tool to check package.json dependency version consistency.


Kubernetes Deployment Strategies

Kubernetes offers multiple deployment strategies; the choice depends on business risk tolerance and rollback speed requirements.

Rolling Update

K8s default strategy, gradually replaces old Pods:

apiVersion: apps/v1
kind: Deployment
metadata:
  name: my-app
spec:
  replicas: 6
  strategy:
    type: RollingUpdate
    rollingUpdate:
      maxSurge: 2        # Max 2 extra Pods at a time
      maxUnavailable: 1   # Max 1 Pod unavailable
  selector:
    matchLabels:
      app: my-app
  template:
    metadata:
      labels:
        app: my-app
    spec:
      containers:
        - name: my-app
          image: registry.example.com/my-app:v2.0.0
          ports:
            - containerPort: 8080
          readinessProbe:
            httpGet:
              path: /healthz
              port: 8080
            initialDelaySeconds: 5
            periodSeconds: 10

Blue-Green Deployment

Run two complete environments simultaneously, switch Service selector for zero-downtime:

# blue-deployment.yaml
apiVersion: apps/v1
kind: Deployment
metadata:
  name: my-app-blue
spec:
  replicas: 3
  selector:
    matchLabels:
      app: my-app
      version: blue
  template:
    metadata:
      labels:
        app: my-app
        version: blue
    spec:
      containers:
        - name: my-app
          image: registry.example.com/my-app:v1.0.0
---
# green-deployment.yaml
apiVersion: apps/v1
kind: Deployment
metadata:
  name: my-app-green
spec:
  replicas: 3
  selector:
    matchLabels:
      app: my-app
      version: green
  template:
    metadata:
      labels:
        app: my-app
        version: green
    spec:
      containers:
        - name: my-app
          image: registry.example.com/my-app:v2.0.0
---
# service.yaml (switch selector for blue-green switch)
apiVersion: v1
kind: Service
metadata:
  name: my-app-svc
spec:
  selector:
    app: my-app
    version: blue    # Change to green to switch to new version
  ports:
    - port: 80
      targetPort: 8080

Canary Release

Gradually shift traffic to the new version, validate at small scale before full rollout:

# canary-with-istio.yaml
apiVersion: networking.istio.io/v1beta1
kind: VirtualService
metadata:
  name: my-app-vs
spec:
  hosts:
    - my-app.example.com
  http:
    - match:
        - headers:
            x-canary:
              exact: "true"
      route:
        - destination:
            host: my-app
            subset: canary
          weight: 100
    - route:
        - destination:
            host: my-app
            subset: stable
          weight: 90
        - destination:
            host: my-app
            subset: canary
          weight: 10

Deployment Strategy Comparison

Strategy Downtime Rollback Speed Resource Cost Complexity Use Case
Rolling Update Low Medium Low Low Routine releases
Blue-Green Zero Fast (switch Service) High (2x) Medium Critical services
Canary Zero Fast Medium High High-risk changes

GitOps and ArgoCD

GitOps is the de facto standard for Kubernetes deployment in 2026—using a Git repository as the single source of truth, all changes triggered by Git commits.

GitOps Core Principles

  1. Declarative: All infrastructure and application configs are declarative
  2. Versioned: All configs stored in Git, complete change history
  3. Auto-pull: Deployment tools automatically pull and apply changes from Git
  4. Continuous reconciliation: Continuously compare cluster state with Git declarations, auto-fix drift

ArgoCD Configuration Example

apiVersion: argoproj.io/v1alpha1
kind: Application
metadata:
  name: my-app
  namespace: argocd
spec:
  project: default
  source:
    repoURL: https://git.example.com/platform/k8s-manifests.git
    targetRevision: main
    path: overlays/production
  destination:
    server: https://kubernetes.default.svc
    namespace: my-app
  syncPolicy:
    automated:
      prune: true
      selfHeal: true
      allowEmpty: false
    syncOptions:
      - CreateNamespace=true
      - PrunePropagationPolicy=foreground
    retry:
      limit: 5
      backoff:
        duration: 5s
        factor: 2
        maxDuration: 3m

Kustomize Multi-Environment Management

k8s-manifests/
├── base/
│   ├── deployment.yaml
│   ├── service.yaml
│   └── kustomization.yaml
└── overlays/
    ├── development/
    │   ├── kustomization.yaml
    │   └── patch-replicas.yaml
    ├── staging/
    │   ├── kustomization.yaml
    │   └── patch-replicas.yaml
    └── production/
        ├── kustomization.yaml
        └── patch-replicas.yaml
# overlays/production/kustomization.yaml
apiVersion: kustomize.config.k8s.io/v1beta1
kind: Kustomization
bases:
  - ../../base
patchesStrategicMerge:
  - patch-replicas.yaml
  - patch-resources.yaml
configMapGenerator:
  - name: app-config
    literals:
      - ENV=production
      - LOG_LEVEL=warn
      - DB_HOST=prod-db.internal

Pipeline as Code: Complete GitHub Actions Workflow

This is a complete implementation of a production-grade CI/CD pipeline, covering the full chain of build, test, security scanning, and deployment:

name: CI/CD Pipeline

on:
  push:
    branches: [main, develop]
  pull_request:
    branches: [main]

env:
  REGISTRY: ghcr.io
  IMAGE_NAME: ${{ github.repository }}
  K8S_NAMESPACE: my-app

jobs:
  # Job 1: Lint and Unit Test
  lint-and-test:
    runs-on: ubuntu-latest
    steps:
      - uses: actions/checkout@v4
      - uses: actions/setup-go@v5
        with:
          go-version: '1.23'
      - name: Lint
        run: golangci-lint run ./...
      - name: Unit Test
        run: go test -race -coverprofile=coverage.out ./...
      - name: Upload Coverage
        uses: codecov/codecov-action@v4
        with:
          file: coverage.out

  # Job 2: Security Scan
  security-scan:
    runs-on: ubuntu-latest
    needs: lint-and-test
    steps:
      - uses: actions/checkout@v4
      - name: Trivy FS Scan
        uses: aquasecurity/trivy-action@master
        with:
          scan-type: fs
          severity: CRITICAL,HIGH
          exit-code: '1'
      - name: Snyk SAST
        uses: snyk/actions/golang@master
        env:
          SNYK_TOKEN: ${{ secrets.SNYK_TOKEN }}

  # Job 3: Build and Push Docker Image
  build-and-push:
    runs-on: ubuntu-latest
    needs: security-scan
    permissions:
      contents: read
      packages: write
    outputs:
      image_tag: ${{ steps.meta.outputs.tags }}
      image_digest: ${{ steps.build.outputs.digest }}
    steps:
      - uses: actions/checkout@v4
      - uses: docker/setup-buildx-action@v3
      - uses: docker/login-action@v3
        with:
          registry: ${{ env.REGISTRY }}
          username: ${{ github.actor }}
          password: ${{ secrets.GITHUB_TOKEN }}
      - name: Docker Metadata
        id: meta
        uses: docker/metadata-action@v5
        with:
          images: ${{ env.REGISTRY }}/${{ env.IMAGE_NAME }}
          tags: |
            type=sha,prefix=
            type=ref,event=branch
            type=semver,pattern={{version}}
      - name: Build and Push
        id: build
        uses: docker/build-push-action@v6
        with:
          context: .
          push: true
          tags: ${{ steps.meta.outputs.tags }}
          labels: ${{ steps.meta.outputs.labels }}
          cache-from: type=gha
          cache-to: type=gha,mode=max
          build-args: |
            BUILD_DATE=${{ github.event.head_commit.timestamp }}
            VCS_REF=${{ github.sha }}

  # Job 4: Image Security Scan
  image-scan:
    runs-on: ubuntu-latest
    needs: build-and-push
    steps:
      - name: Trivy Image Scan
        uses: aquasecurity/trivy-action@master
        with:
          image-ref: ${{ needs.build-and-push.outputs.image_tag }}
          severity: CRITICAL,HIGH
          exit-code: '1'
          format: sarif
          output: trivy-results.sarif
      - name: Upload SARIF
        uses: github/codeql-action/upload-sarif@v3
        with:
          sarif_file: trivy-results.sarif

  # Job 5: Deploy to K8s
  deploy:
    runs-on: ubuntu-latest
    needs: [build-and-push, image-scan]
    if: github.ref == 'refs/heads/main'
    environment: production
    steps:
      - uses: actions/checkout@v4
      - uses: azure/setup-kubectl@v3
      - uses: azure/setup-helm@v3
      - name: Configure kubectl
        run: |
          mkdir -p $HOME/.kube
          echo "${{ secrets.KUBE_CONFIG }}" | base64 -d > $HOME/.kube/config
      - name: Deploy with Helm
        run: |
          helm upgrade --install my-app ./helm/my-app \
            --namespace ${{ env.K8S_NAMESPACE }} \
            --set image.tag=${{ needs.build-and-push.outputs.image_tag }} \
            --set image.digest=${{ needs.build-and-push.outputs.image_digest }} \
            --values ./helm/my-app/values-production.yaml \
            --timeout 5m \
            --wait
      - name: Verify Deployment
        run: |
          kubectl rollout status deployment/my-app \
            --namespace ${{ env.K8S_NAMESPACE }} \
            --timeout=3m
      - name: Smoke Test
        run: |
          STATUS=$(curl -s -o /dev/null -w "%{http_code}" \
            https://my-app.example.com/healthz)
          if [ "$STATUS" != "200" ]; then
            echo "Smoke test failed: HTTP $STATUS"
            exit 1
          fi

GitLab CI Complete Configuration

# .gitlab-ci.yml
stages:
  - test
  - security
  - build
  - deploy

variables:
  DOCKER_TLS_CERTDIR: "/certs"
  REGISTRY: $CI_REGISTRY
  IMAGE_TAG: $CI_REGISTRY_IMAGE:$CI_COMMIT_SHORT_SHA

test:
  stage: test
  image: golang:1.23-alpine
  script:
    - go test -race -coverprofile=coverage.out ./...
    - go tool cover -func=coverage.out
  artifacts:
    reports:
      coverage_report:
        coverage_format: cobertura
        path: coverage.xml

security-scan:
  stage: security
  image: aquasec/trivy:latest
  script:
    - trivy fs --severity CRITICAL,HIGH --exit-code 1 .
  allow_failure: false

build:
  stage: build
  image: docker:24
  services:
    - docker:24-dind
  before_script:
    - docker login -u $CI_REGISTRY_USER -p $CI_REGISTRY_PASSWORD $CI_REGISTRY
  script:
    - docker build
        --cache-from $CI_REGISTRY_IMAGE:latest
        --tag $IMAGE_TAG
        --tag $CI_REGISTRY_IMAGE:latest
        --build-arg VCS_REF=$CI_COMMIT_SHA
        .
    - docker push $IMAGE_TAG
    - docker push $CI_REGISTRY_IMAGE:latest

deploy:staging:
  stage: deploy
  image: bitnami/kubectl:latest
  script:
    - kubectl config use-context staging
    - helm upgrade --install my-app ./helm/my-app
        --namespace staging
        --set image.tag=$CI_COMMIT_SHORT_SHA
        --values ./helm/my-app/values-staging.yaml
        --wait
  environment:
    name: staging
    url: https://staging.my-app.example.com
  only:
    - develop

deploy:production:
  stage: deploy
  image: bitnami/kubectl:latest
  script:
    - kubectl config use-context production
    - helm upgrade --install my-app ./helm/my-app
        --namespace production
        --set image.tag=$CI_COMMIT_SHORT_SHA
        --values ./helm/my-app/values-production.yaml
        --wait
  environment:
    name: production
    url: https://my-app.example.com
  when: manual
  only:
    - main

Jenkins Pipeline (Declarative)

// Jenkinsfile
pipeline {
    agent any

    environment {
        REGISTRY = 'registry.example.com'
        IMAGE_NAME = 'my-app'
        IMAGE_TAG = "${env.BUILD_NUMBER}-${env.GIT_COMMIT.take(8)}"
    }

    stages {
        stage('Test') {
            agent { label 'golang' }
            steps {
                sh 'go test -race -coverprofile=coverage.out ./...'
                sh 'golangci-lint run ./...'
            }
        }

        stage('Security Scan') {
            steps {
                sh "trivy fs --severity CRITICAL,HIGH --exit-code 1 ."
            }
        }

        stage('Build & Push') {
            agent { label 'docker' }
            steps {
                script {
                    docker.withRegistry("https://${REGISTRY}", 'registry-credentials') {
                        def image = docker.build(
                            "${IMAGE_NAME}:${IMAGE_TAG}",
                            '--build-arg VCS_REF=${GIT_COMMIT} .'
                        )
                        image.push()
                        image.push('latest')
                    }
                }
            }
        }

        stage('Deploy to Staging') {
            when { branch 'develop' }
            steps {
                sh """
                    helm upgrade --install ${IMAGE_NAME} ./helm/${IMAGE_NAME} \\
                        --namespace staging \\
                        --set image.tag=${IMAGE_TAG} \\
                        --values ./helm/${IMAGE_NAME}/values-staging.yaml \\
                        --wait
                """
            }
        }

        stage('Deploy to Production') {
            when { branch 'main' }
            input {
                message 'Confirm deployment to production?'
                ok 'Deploy'
            }
            steps {
                sh """
                    helm upgrade --install ${IMAGE_NAME} ./helm/${IMAGE_NAME} \\
                        --namespace production \\
                        --set image.tag=${IMAGE_TAG} \\
                        --values ./helm/${IMAGE_NAME}/values-production.yaml \\
                        --wait
                """
            }
        }
    }

    post {
        failure {
            slackSend(
                channel: '#cicd-alerts',
                color: 'danger',
                message: "Pipeline failed: ${env.JOB_NAME} #${env.BUILD_NUMBER}"
            )
        }
        success {
            slackSend(
                channel: '#cicd-alerts',
                color: 'good',
                message: "Deploy succeeded: ${env.JOB_NAME} #${env.BUILD_NUMBER} -> ${IMAGE_TAG}"
            )
        }
    }
}

Container Registry Management

Image Tagging Strategy

Tag Type Example Lifecycle Use Case
Immutable tag sha-abc1234 Permanent Production deployment reference
Semantic version v2.1.0 Permanent Version release
Branch tag main, develop Overwritable Dev/preview
latest latest Overwritable Local dev only

Core principle: Production never uses mutable tags (like latest), must use immutable tags (like Git SHA).

Image Cleanup Strategy

# GitHub Actions: Periodic old image cleanup
name: Registry Cleanup
on:
  schedule:
    - cron: '0 2 * * 0'  # Every Sunday 2am

jobs:
  cleanup:
    runs-on: ubuntu-latest
    steps:
      - name: Delete untagged images
        uses: actions/delete-package-versions@v5
        with:
          package-name: my-app
          min-versions-to-keep: 10
          delete-only-untagged-versions: true

Security Scanning Integration

Trivy: Full-Stack Security Scanning

# Filesystem scan (dependency vulnerabilities)
trivy fs --severity CRITICAL,HIGH --exit-code 1 .

# Image scan
trivy image --severity CRITICAL,HIGH registry.example.com/my-app:v2.0.0

# IaC scan (K8s manifest / Dockerfile)
trivy config --severity CRITICAL,HIGH ./k8s/

# SBOM generation
trivy image --format spdx-json --output sbom.json registry.example.com/my-app:v2.0.0

Snyk: Developer-Friendly Security Platform

# GitHub Actions: Snyk integration
- name: Snyk Open Source
  uses: snyk/actions/golang@master
  env:
    SNYK_TOKEN: ${{ secrets.SNYK_TOKEN }}
  with:
    args: --severity-threshold=high

- name: Snyk Container
  uses: snyk/actions/docker@master
  env:
    SNYK_TOKEN: ${{ secrets.SNYK_TOKEN }}
  with:
    image: registry.example.com/my-app:v2.0.0
    args: --severity-threshold=high --file=Dockerfile

Security Scanning Layers

Layer Tool Scan Target Trigger
SAST Snyk Code / SonarQube Source code vulnerabilities Every commit
SCA Snyk Open Source / Trivy fs Dependency vulnerabilities Every commit
Container scan Trivy image / Snyk Container Image vulnerabilities After image build
IaC scan Trivy config / Checkov K8s/Dockerfile config risks PR stage
DAST OWASP ZAP Runtime vulnerabilities After staging deploy

💡 Use the Hash Encryption tool to generate checksums for CI/CD Secrets, ensuring sensitive configs are not tampered with.


Environment Management: Dev / Staging / Prod

Environment Isolation Strategy

# Helm values multi-environment config
# values-development.yaml
replicaCount: 1
resources:
  requests:
    cpu: 100m
    memory: 128Mi
autoscaling:
  enabled: false
config:
  logLevel: debug
  dbHost: dev-db.internal

# values-staging.yaml
replicaCount: 2
resources:
  requests:
    cpu: 250m
    memory: 256Mi
autoscaling:
  enabled: true
  minReplicas: 2
  maxReplicas: 5
config:
  logLevel: info
  dbHost: staging-db.internal

# values-production.yaml
replicaCount: 3
resources:
  requests:
    cpu: 500m
    memory: 512Mi
  limits:
    cpu: 1000m
    memory: 1Gi
autoscaling:
  enabled: true
  minReplicas: 3
  maxReplicas: 20
  targetCPUUtilizationPercentage: 70
config:
  logLevel: warn
  dbHost: prod-db.internal

GitHub Actions Environment Protection Rules

# Production requires manual approval
deploy-production:
  runs-on: ubuntu-latest
  environment: production    # Configure approvers in GitHub Settings
  steps:
    - name: Deploy
      run: helm upgrade --install my-app ./helm/my-app

In GitHub repo Settings → Environments:

  • production: Required reviewers = 2, Wait timer = 5 minutes
  • staging: No approval needed, auto-deploy

Monitoring and Alerting Integration

Prometheus + Grafana Metrics Collection

# K8s Pod Monitor annotations
apiVersion: apps/v1
kind: Deployment
metadata:
  name: my-app
  annotations:
    prometheus.io/scrape: "true"
    prometheus.io/port: "8080"
    prometheus.io/path: "/metrics"
spec:
  template:
    spec:
      containers:
        - name: my-app
          image: registry.example.com/my-app:v2.0.0
          ports:
            - containerPort: 8080

Deployment Alert Rules

apiVersion: monitoring.coreos.com/v1
kind: PrometheusRule
metadata:
  name: deployment-alerts
  namespace: monitoring
spec:
  groups:
    - name: deployment
      rules:
        - alert: DeploymentRolloutStuck
          expr: |
            kube_deployment_status_replicas_unavailable / kube_deployment_status_replicas > 0.5
          for: 5m
          labels:
            severity: warning
          annotations:
            summary: "Deployment {{ $labels.deployment }} rollout stuck"
        - alert: HighErrorRateAfterDeploy
          expr: |
            rate(http_requests_total{status=~"5.."}[5m])
            /
            rate(http_requests_total[5m]) > 0.05
          for: 3m
          labels:
            severity: critical
          annotations:
            summary: "5xx error rate exceeds 5% after deployment"

Slack/DingTalk Alert Notification

# GitHub Actions: Deployment notification
- name: Notify Deployment
  if: always()
  uses: 8398a7/action-slack@v3
  with:
    status: ${{ job.status }}
    fields: repo,message,commit,author,action,eventName,ref,workflow
  env:
    SLACK_WEBHOOK_URL: ${{ secrets.SLACK_WEBHOOK }}

Rollback Strategies

Auto Rollback: On Health Check Failure

# Helm deploy + auto rollback
- name: Deploy with Auto Rollback
  run: |
    helm upgrade --install my-app ./helm/my-app \
      --namespace production \
      --set image.tag=${{ steps.meta.outputs.tags }} \
      --values ./helm/my-app/values-production.yaml \
      --timeout 5m \
      --wait || \
    (echo "Deploy failed, rolling back..." && \
     helm rollback my-app --namespace production && \
     exit 1)

Manual Rollback: Based on Git SHA

# Rollback to specific revision
kubectl rollout undo deployment/my-app --to-revision=3

# Rollback Helm release
helm rollback my-app 2 --namespace production

# GitOps-based rollback: revert Git commit
git revert <commit-hash>
git push origin main
# ArgoCD auto-detects change and executes rollback

Canary Auto Rollback

# Argo Rollouts: Auto analysis + rollback
apiVersion: argoproj.io/v1alpha1
kind: Rollout
metadata:
  name: my-app
spec:
  strategy:
    canary:
      canaryAnalysis:
        templates:
          - templateName: success-rate
            clusterScope: true
        startingStep: 2
        steps:
          - setWeight: 10
          - pause: { duration: 5m }
          - setWeight: 30
          - pause: { duration: 5m }
          - setWeight: 60
          - pause: { duration: 5m }
          - setWeight: 100
        analysisRun:
          successfulRunHistoryLimit: 3
          unsuccessfulRunHistoryLimit: 3

Common Pipeline Failures and Fixes

Failure Symptom Root Cause Fix
Docker build cache invalid Missing .dockerignore or wrong COPY order Optimize Dockerfile instruction order, add .dockerignore
Image push 403 Registry auth expired or insufficient permissions Check Service Account / Token permissions
K8s ImagePullBackOff Image tag doesn't exist or Registry unreachable Verify image tag, check Registry network and Secret
Helm deploy timeout readinessProbe misconfigured or insufficient resources Adjust probe params, increase resources limits
Staging vs prod inconsistency Environment config differences Use Kustomize/Helm for unified management, reduce hardcoding
Security scan false positive Vulnerability from indirect dependency Configure .trivyignore or Snyk policy to ignore known false positives
Concurrent deploy conflict Multiple people triggering pipeline simultaneously Use GitHub Concurrency or GitLab resource_group
Secret leak Plaintext in YAML or logs Use Sealed Secrets / External Secrets Operator

Concurrency Control

# GitHub Actions: Prevent concurrent deploy conflicts
concurrency:
  group: deploy-${{ github.ref }}
  cancel-in-progress: true

Debugging Tips

# View Pod events
kubectl describe pod <pod-name> -n <namespace>

# View deployment history
kubectl rollout history deployment/my-app -n production

# View Helm release history
helm history my-app -n production

# Port-forward for debugging
kubectl port-forward svc/my-app 8080:80 -n staging

# View container logs
kubectl logs -f deployment/my-app -n production --all-containers

FAQ

Q: Is the GitHub Actions free tier sufficient? A: Unlimited for public repos, 2000 minutes/month for private repos (Linux). Self-hosted runners have no limits.

Q: Should Docker images use the latest tag? A: Never use latest in production. Use Git SHA or semantic version as immutable tags to ensure deployments are traceable and rollbackable.

Q: How to choose between blue-green and canary? A: Blue-green for critical services needing fast rollback (just switch Service), canary for high-risk changes needing gradual validation. Use rolling update for routine releases.

Q: What's the difference between GitOps and traditional CI/CD Push model? A: Traditional Push: CI pipeline actively kubectl apply. GitOps: in-cluster Agent (ArgoCD) actively pulls Git changes. GitOps advantage: Git is the single source of truth, cluster state drift is auto-repaired.

Q: How to handle Secrets in CI/CD? A: Use platform-native Secret management (GitHub Secrets / GitLab Variables / Jenkins Credentials), use Sealed Secrets or External Secrets Operator in K8s, never commit Secrets to Git.

Q: How to manage multi-cluster deployment? A: Use ArgoCD ApplicationSet + Git directory structure, or Helm + kubeconfig multi-context switching. ArgoCD is recommended as it natively supports multi-cluster.

Q: Pipeline too slow, how to optimize? A: 1) Leverage Docker layer cache and GitHub Actions cache; 2) Run independent Jobs in parallel; 3) Use self-hosted runners to reduce cold start; 4) Incremental testing (only test changed modules).

Try these browser-local tools — no sign-up required →

#DevOps#CI/CD#Docker#Kubernetes#教程