DevOps CI/CD Pipeline in Practice: Docker + Kubernetes Full Chain

CI/CD Fundamentals & 2026 Landscape

CI/CD (Continuous Integration/Continuous Deployment) is the core DevOps practice, aiming to make the entire process from code commit to production automated, traceable, and rollbackable.

Core Concepts

Concept	Full Name	Core Goal
CI	Continuous Integration	Merge code frequently, auto build+test, catch issues early
CD (Delivery)	Continuous Delivery	Code is always ready to deploy, requires manual approval
CD (Deployment)	Continuous Deployment	Code auto-deploys to production after passing tests, no manual gate

2026 Mainstream CI/CD Platform Comparison

Platform	Use Case	Core Strength	Pipeline Definition
GitHub Actions	Open source, small-medium teams	Native integration, Marketplace ecosystem, generous free tier	`.github/workflows/*.yml`
GitLab CI	Enterprise self-hosted	Built-in container registry, security scanning, K8s integration	`.gitlab-ci.yml`
Jenkins	Complex pipelines, traditional enterprise	Richest plugin ecosystem, highly customizable	`Jenkinsfile` (Groovy)

💡 Use the YAML Formatter tool to edit and validate CI/CD configuration files, avoiding indentation errors.

Docker Best Practices

Docker is the cornerstone of CI/CD pipelines—every build should produce a deterministic, reproducible container image.

Multi-Stage Build

Multi-stage builds are the top technique for reducing image size, separating build environment from runtime:

# Stage 1: Build
FROM golang:1.23-alpine AS builder
WORKDIR /app
COPY go.mod go.sum ./
RUN go mod download
COPY . .
RUN CGO_ENABLED=0 GOOS=linux go build -ldflags="-s -w" -o /app/server ./cmd/server

# Stage 2: Runtime
FROM gcr.io/distroless/static:nonroot
COPY --from=builder /app/server /server
USER nonroot:nonroot
ENTRYPOINT ["/server"]

Result: Go image reduced from ~300MB to ~5MB.

Node.js multi-stage build example:

FROM node:22-alpine AS builder
WORKDIR /app
COPY package*.json ./
RUN npm ci --only=production
COPY . .
RUN npm run build

FROM node:22-alpine AS runner
WORKDIR /app
COPY --from=builder /app/dist ./dist
COPY --from=builder /app/node_modules ./node_modules
EXPOSE 3000
CMD ["node", "dist/main.js"]

Image Layer Caching Optimization

The key principle of Docker layer caching: instructions that change less frequently go first, those that change more go last.

# Good: copy dependency files first, leverage cache
COPY package*.json ./
RUN npm ci
COPY . .

# Bad: copy all source first, reinstall deps every time
COPY . .
RUN npm ci

Image Size Optimization Checklist

Technique	Effect	Use Case
Multi-stage build	Reduce 60-90%	All compiled languages
Alpine base image	Reduce 50-80%	Apps not dependent on glibc
distroless image	App binary only	Go, Java static compilation
`.dockerignore`	Reduce build context	All projects
Merge RUN instructions	Reduce image layers	apt/apk install scenarios
Strip binary `-ldflags="-s -w"`	Reduce 20-30%	Go projects

# Merge RUN instructions to reduce layers
RUN apk add --no-cache curl=8.11.0 && \
    apk add --no-cache git=2.45.0 && \
    rm -rf /var/cache/apk/*

.dockerignore Best Practices

# .dockerignore
node_modules
npm-debug.log
.git
.github
.gitlab
.vscode
.idea
*.md
*.test.js
coverage/
dist/
.env
.env.local

💡 Use the JSON Formatter tool to check package.json dependency version consistency.

Kubernetes Deployment Strategies

Kubernetes offers multiple deployment strategies; the choice depends on business risk tolerance and rollback speed requirements.

Rolling Update

K8s default strategy, gradually replaces old Pods:

apiVersion: apps/v1
kind: Deployment
metadata:
  name: my-app
spec:
  replicas: 6
  strategy:
    type: RollingUpdate
    rollingUpdate:
      maxSurge: 2        # Max 2 extra Pods at a time
      maxUnavailable: 1   # Max 1 Pod unavailable
  selector:
    matchLabels:
      app: my-app
  template:
    metadata:
      labels:
        app: my-app
    spec:
      containers:
        - name: my-app
          image: registry.example.com/my-app:v2.0.0
          ports:
            - containerPort: 8080
          readinessProbe:
            httpGet:
              path: /healthz
              port: 8080
            initialDelaySeconds: 5
            periodSeconds: 10

Blue-Green Deployment

Run two complete environments simultaneously, switch Service selector for zero-downtime:

# blue-deployment.yaml
apiVersion: apps/v1
kind: Deployment
metadata:
  name: my-app-blue
spec:
  replicas: 3
  selector:
    matchLabels:
      app: my-app
      version: blue
  template:
    metadata:
      labels:
        app: my-app
        version: blue
    spec:
      containers:
        - name: my-app
          image: registry.example.com/my-app:v1.0.0
---
# green-deployment.yaml
apiVersion: apps/v1
kind: Deployment
metadata:
  name: my-app-green
spec:
  replicas: 3
  selector:
    matchLabels:
      app: my-app
      version: green
  template:
    metadata:
      labels:
        app: my-app
        version: green
    spec:
      containers:
        - name: my-app
          image: registry.example.com/my-app:v2.0.0
---
# service.yaml (switch selector for blue-green switch)
apiVersion: v1
kind: Service
metadata:
  name: my-app-svc
spec:
  selector:
    app: my-app
    version: blue    # Change to green to switch to new version
  ports:
    - port: 80
      targetPort: 8080

Canary Release

Gradually shift traffic to the new version, validate at small scale before full rollout:

# canary-with-istio.yaml
apiVersion: networking.istio.io/v1beta1
kind: VirtualService
metadata:
  name: my-app-vs
spec:
  hosts:
    - my-app.example.com
  http:
    - match:
        - headers:
            x-canary:
              exact: "true"
      route:
        - destination:
            host: my-app
            subset: canary
          weight: 100
    - route:
        - destination:
            host: my-app
            subset: stable
          weight: 90
        - destination:
            host: my-app
            subset: canary
          weight: 10

Deployment Strategy Comparison

Strategy	Downtime	Rollback Speed	Resource Cost	Complexity	Use Case
Rolling Update	Low	Medium	Low	Low	Routine releases
Blue-Green	Zero	Fast (switch Service)	High (2x)	Medium	Critical services
Canary	Zero	Fast	Medium	High	High-risk changes

GitOps and ArgoCD

GitOps is the de facto standard for Kubernetes deployment in 2026—using a Git repository as the single source of truth, all changes triggered by Git commits.

GitOps Core Principles

Declarative: All infrastructure and application configs are declarative
Versioned: All configs stored in Git, complete change history
Auto-pull: Deployment tools automatically pull and apply changes from Git
Continuous reconciliation: Continuously compare cluster state with Git declarations, auto-fix drift

ArgoCD Configuration Example

apiVersion: argoproj.io/v1alpha1
kind: Application
metadata:
  name: my-app
  namespace: argocd
spec:
  project: default
  source:
    repoURL: https://git.example.com/platform/k8s-manifests.git
    targetRevision: main
    path: overlays/production
  destination:
    server: https://kubernetes.default.svc
    namespace: my-app
  syncPolicy:
    automated:
      prune: true
      selfHeal: true
      allowEmpty: false
    syncOptions:
      - CreateNamespace=true
      - PrunePropagationPolicy=foreground
    retry:
      limit: 5
      backoff:
        duration: 5s
        factor: 2
        maxDuration: 3m

Kustomize Multi-Environment Management

k8s-manifests/
├── base/
│   ├── deployment.yaml
│   ├── service.yaml
│   └── kustomization.yaml
└── overlays/
    ├── development/
    │   ├── kustomization.yaml
    │   └── patch-replicas.yaml
    ├── staging/
    │   ├── kustomization.yaml
    │   └── patch-replicas.yaml
    └── production/
        ├── kustomization.yaml
        └── patch-replicas.yaml

# overlays/production/kustomization.yaml
apiVersion: kustomize.config.k8s.io/v1beta1
kind: Kustomization
bases:
  - ../../base
patchesStrategicMerge:
  - patch-replicas.yaml
  - patch-resources.yaml
configMapGenerator:
  - name: app-config
    literals:
      - ENV=production
      - LOG_LEVEL=warn
      - DB_HOST=prod-db.internal

Pipeline as Code: Complete GitHub Actions Workflow

This is a complete implementation of a production-grade CI/CD pipeline, covering the full chain of build, test, security scanning, and deployment:

name: CI/CD Pipeline

on:
  push:
    branches: [main, develop]
  pull_request:
    branches: [main]

env:
  REGISTRY: ghcr.io
  IMAGE_NAME: ${{ github.repository }}
  K8S_NAMESPACE: my-app

jobs:
  # Job 1: Lint and Unit Test
  lint-and-test:
    runs-on: ubuntu-latest
    steps:
      - uses: actions/checkout@v4
      - uses: actions/setup-go@v5
        with:
          go-version: '1.23'
      - name: Lint
        run: golangci-lint run ./...
      - name: Unit Test
        run: go test -race -coverprofile=coverage.out ./...
      - name: Upload Coverage
        uses: codecov/codecov-action@v4
        with:
          file: coverage.out

  # Job 2: Security Scan
  security-scan:
    runs-on: ubuntu-latest
    needs: lint-and-test
    steps:
      - uses: actions/checkout@v4
      - name: Trivy FS Scan
        uses: aquasecurity/trivy-action@master
        with:
          scan-type: fs
          severity: CRITICAL,HIGH
          exit-code: '1'
      - name: Snyk SAST
        uses: snyk/actions/golang@master
        env:
          SNYK_TOKEN: ${{ secrets.SNYK_TOKEN }}

  # Job 3: Build and Push Docker Image
  build-and-push:
    runs-on: ubuntu-latest
    needs: security-scan
    permissions:
      contents: read
      packages: write
    outputs:
      image_tag: ${{ steps.meta.outputs.tags }}
      image_digest: ${{ steps.build.outputs.digest }}
    steps:
      - uses: actions/checkout@v4
      - uses: docker/setup-buildx-action@v3
      - uses: docker/login-action@v3
        with:
          registry: ${{ env.REGISTRY }}
          username: ${{ github.actor }}
          password: ${{ secrets.GITHUB_TOKEN }}
      - name: Docker Metadata
        id: meta
        uses: docker/metadata-action@v5
        with:
          images: ${{ env.REGISTRY }}/${{ env.IMAGE_NAME }}
          tags: |
            type=sha,prefix=
            type=ref,event=branch
            type=semver,pattern={{version}}
      - name: Build and Push
        id: build
        uses: docker/build-push-action@v6
        with:
          context: .
          push: true
          tags: ${{ steps.meta.outputs.tags }}
          labels: ${{ steps.meta.outputs.labels }}
          cache-from: type=gha
          cache-to: type=gha,mode=max
          build-args: |
            BUILD_DATE=${{ github.event.head_commit.timestamp }}
            VCS_REF=${{ github.sha }}

  # Job 4: Image Security Scan
  image-scan:
    runs-on: ubuntu-latest
    needs: build-and-push
    steps:
      - name: Trivy Image Scan
        uses: aquasecurity/trivy-action@master
        with:
          image-ref: ${{ needs.build-and-push.outputs.image_tag }}
          severity: CRITICAL,HIGH
          exit-code: '1'
          format: sarif
          output: trivy-results.sarif
      - name: Upload SARIF
        uses: github/codeql-action/upload-sarif@v3
        with:
          sarif_file: trivy-results.sarif

  # Job 5: Deploy to K8s
  deploy:
    runs-on: ubuntu-latest
    needs: [build-and-push, image-scan]
    if: github.ref == 'refs/heads/main'
    environment: production
    steps:
      - uses: actions/checkout@v4
      - uses: azure/setup-kubectl@v3
      - uses: azure/setup-helm@v3
      - name: Configure kubectl
        run: |
          mkdir -p $HOME/.kube
          echo "${{ secrets.KUBE_CONFIG }}" | base64 -d > $HOME/.kube/config
      - name: Deploy with Helm
        run: |
          helm upgrade --install my-app ./helm/my-app \
            --namespace ${{ env.K8S_NAMESPACE }} \
            --set image.tag=${{ needs.build-and-push.outputs.image_tag }} \
            --set image.digest=${{ needs.build-and-push.outputs.image_digest }} \
            --values ./helm/my-app/values-production.yaml \
            --timeout 5m \
            --wait
      - name: Verify Deployment
        run: |
          kubectl rollout status deployment/my-app \
            --namespace ${{ env.K8S_NAMESPACE }} \
            --timeout=3m
      - name: Smoke Test
        run: |
          STATUS=$(curl -s -o /dev/null -w "%{http_code}" \
            https://my-app.example.com/healthz)
          if [ "$STATUS" != "200" ]; then
            echo "Smoke test failed: HTTP $STATUS"
            exit 1
          fi

GitLab CI Complete Configuration

# .gitlab-ci.yml
stages:
  - test
  - security
  - build
  - deploy

variables:
  DOCKER_TLS_CERTDIR: "/certs"
  REGISTRY: $CI_REGISTRY
  IMAGE_TAG: $CI_REGISTRY_IMAGE:$CI_COMMIT_SHORT_SHA

test:
  stage: test
  image: golang:1.23-alpine
  script:
    - go test -race -coverprofile=coverage.out ./...
    - go tool cover -func=coverage.out
  artifacts:
    reports:
      coverage_report:
        coverage_format: cobertura
        path: coverage.xml

security-scan:
  stage: security
  image: aquasec/trivy:latest
  script:
    - trivy fs --severity CRITICAL,HIGH --exit-code 1 .
  allow_failure: false

build:
  stage: build
  image: docker:24
  services:
    - docker:24-dind
  before_script:
    - docker login -u $CI_REGISTRY_USER -p $CI_REGISTRY_PASSWORD $CI_REGISTRY
  script:
    - docker build
        --cache-from $CI_REGISTRY_IMAGE:latest
        --tag $IMAGE_TAG
        --tag $CI_REGISTRY_IMAGE:latest
        --build-arg VCS_REF=$CI_COMMIT_SHA
        .
    - docker push $IMAGE_TAG
    - docker push $CI_REGISTRY_IMAGE:latest

deploy:staging:
  stage: deploy
  image: bitnami/kubectl:latest
  script:
    - kubectl config use-context staging
    - helm upgrade --install my-app ./helm/my-app
        --namespace staging
        --set image.tag=$CI_COMMIT_SHORT_SHA
        --values ./helm/my-app/values-staging.yaml
        --wait
  environment:
    name: staging
    url: https://staging.my-app.example.com
  only:
    - develop

deploy:production:
  stage: deploy
  image: bitnami/kubectl:latest
  script:
    - kubectl config use-context production
    - helm upgrade --install my-app ./helm/my-app
        --namespace production
        --set image.tag=$CI_COMMIT_SHORT_SHA
        --values ./helm/my-app/values-production.yaml
        --wait
  environment:
    name: production
    url: https://my-app.example.com
  when: manual
  only:
    - main

Jenkins Pipeline (Declarative)

// Jenkinsfile
pipeline {
    agent any

    environment {
        REGISTRY = 'registry.example.com'
        IMAGE_NAME = 'my-app'
        IMAGE_TAG = "${env.BUILD_NUMBER}-${env.GIT_COMMIT.take(8)}"
    }

    stages {
        stage('Test') {
            agent { label 'golang' }
            steps {
                sh 'go test -race -coverprofile=coverage.out ./...'
                sh 'golangci-lint run ./...'
            }
        }

        stage('Security Scan') {
            steps {
                sh "trivy fs --severity CRITICAL,HIGH --exit-code 1 ."
            }
        }

        stage('Build & Push') {
            agent { label 'docker' }
            steps {
                script {
                    docker.withRegistry("https://${REGISTRY}", 'registry-credentials') {
                        def image = docker.build(
                            "${IMAGE_NAME}:${IMAGE_TAG}",
                            '--build-arg VCS_REF=${GIT_COMMIT} .'
                        )
                        image.push()
                        image.push('latest')
                    }
                }
            }
        }

        stage('Deploy to Staging') {
            when { branch 'develop' }
            steps {
                sh """
                    helm upgrade --install ${IMAGE_NAME} ./helm/${IMAGE_NAME} \\
                        --namespace staging \\
                        --set image.tag=${IMAGE_TAG} \\
                        --values ./helm/${IMAGE_NAME}/values-staging.yaml \\
                        --wait
                """
            }
        }

        stage('Deploy to Production') {
            when { branch 'main' }
            input {
                message 'Confirm deployment to production?'
                ok 'Deploy'
            }
            steps {
                sh """
                    helm upgrade --install ${IMAGE_NAME} ./helm/${IMAGE_NAME} \\
                        --namespace production \\
                        --set image.tag=${IMAGE_TAG} \\
                        --values ./helm/${IMAGE_NAME}/values-production.yaml \\
                        --wait
                """
            }
        }
    }

    post {
        failure {
            slackSend(
                channel: '#cicd-alerts',
                color: 'danger',
                message: "Pipeline failed: ${env.JOB_NAME} #${env.BUILD_NUMBER}"
            )
        }
        success {
            slackSend(
                channel: '#cicd-alerts',
                color: 'good',
                message: "Deploy succeeded: ${env.JOB_NAME} #${env.BUILD_NUMBER} -> ${IMAGE_TAG}"
            )
        }
    }
}

Container Registry Management

Image Tagging Strategy

Tag Type	Example	Lifecycle	Use Case
Immutable tag	`sha-abc1234`	Permanent	Production deployment reference
Semantic version	`v2.1.0`	Permanent	Version release
Branch tag	`main`, `develop`	Overwritable	Dev/preview
`latest`	`latest`	Overwritable	Local dev only

Core principle: Production never uses mutable tags (like latest), must use immutable tags (like Git SHA).

Image Cleanup Strategy

# GitHub Actions: Periodic old image cleanup
name: Registry Cleanup
on:
  schedule:
    - cron: '0 2 * * 0'  # Every Sunday 2am

jobs:
  cleanup:
    runs-on: ubuntu-latest
    steps:
      - name: Delete untagged images
        uses: actions/delete-package-versions@v5
        with:
          package-name: my-app
          min-versions-to-keep: 10
          delete-only-untagged-versions: true

Security Scanning Integration

Trivy: Full-Stack Security Scanning

# Filesystem scan (dependency vulnerabilities)
trivy fs --severity CRITICAL,HIGH --exit-code 1 .

# Image scan
trivy image --severity CRITICAL,HIGH registry.example.com/my-app:v2.0.0

# IaC scan (K8s manifest / Dockerfile)
trivy config --severity CRITICAL,HIGH ./k8s/

# SBOM generation
trivy image --format spdx-json --output sbom.json registry.example.com/my-app:v2.0.0

Snyk: Developer-Friendly Security Platform

# GitHub Actions: Snyk integration
- name: Snyk Open Source
  uses: snyk/actions/golang@master
  env:
    SNYK_TOKEN: ${{ secrets.SNYK_TOKEN }}
  with:
    args: --severity-threshold=high

- name: Snyk Container
  uses: snyk/actions/docker@master
  env:
    SNYK_TOKEN: ${{ secrets.SNYK_TOKEN }}
  with:
    image: registry.example.com/my-app:v2.0.0
    args: --severity-threshold=high --file=Dockerfile

Security Scanning Layers

Layer	Tool	Scan Target	Trigger
SAST	Snyk Code / SonarQube	Source code vulnerabilities	Every commit
SCA	Snyk Open Source / Trivy fs	Dependency vulnerabilities	Every commit
Container scan	Trivy image / Snyk Container	Image vulnerabilities	After image build
IaC scan	Trivy config / Checkov	K8s/Dockerfile config risks	PR stage
DAST	OWASP ZAP	Runtime vulnerabilities	After staging deploy

💡 Use the Hash Encryption tool to generate checksums for CI/CD Secrets, ensuring sensitive configs are not tampered with.

Environment Management: Dev / Staging / Prod

Environment Isolation Strategy

# Helm values multi-environment config
# values-development.yaml
replicaCount: 1
resources:
  requests:
    cpu: 100m
    memory: 128Mi
autoscaling:
  enabled: false
config:
  logLevel: debug
  dbHost: dev-db.internal

# values-staging.yaml
replicaCount: 2
resources:
  requests:
    cpu: 250m
    memory: 256Mi
autoscaling:
  enabled: true
  minReplicas: 2
  maxReplicas: 5
config:
  logLevel: info
  dbHost: staging-db.internal

# values-production.yaml
replicaCount: 3
resources:
  requests:
    cpu: 500m
    memory: 512Mi
  limits:
    cpu: 1000m
    memory: 1Gi
autoscaling:
  enabled: true
  minReplicas: 3
  maxReplicas: 20
  targetCPUUtilizationPercentage: 70
config:
  logLevel: warn
  dbHost: prod-db.internal

GitHub Actions Environment Protection Rules

# Production requires manual approval
deploy-production:
  runs-on: ubuntu-latest
  environment: production    # Configure approvers in GitHub Settings
  steps:
    - name: Deploy
      run: helm upgrade --install my-app ./helm/my-app

In GitHub repo Settings → Environments:

production: Required reviewers = 2, Wait timer = 5 minutes
staging: No approval needed, auto-deploy

Monitoring and Alerting Integration

Prometheus + Grafana Metrics Collection

# K8s Pod Monitor annotations
apiVersion: apps/v1
kind: Deployment
metadata:
  name: my-app
  annotations:
    prometheus.io/scrape: "true"
    prometheus.io/port: "8080"
    prometheus.io/path: "/metrics"
spec:
  template:
    spec:
      containers:
        - name: my-app
          image: registry.example.com/my-app:v2.0.0
          ports:
            - containerPort: 8080

Deployment Alert Rules

apiVersion: monitoring.coreos.com/v1
kind: PrometheusRule
metadata:
  name: deployment-alerts
  namespace: monitoring
spec:
  groups:
    - name: deployment
      rules:
        - alert: DeploymentRolloutStuck
          expr: |
            kube_deployment_status_replicas_unavailable / kube_deployment_status_replicas > 0.5
          for: 5m
          labels:
            severity: warning
          annotations:
            summary: "Deployment {{ $labels.deployment }} rollout stuck"
        - alert: HighErrorRateAfterDeploy
          expr: |
            rate(http_requests_total{status=~"5.."}[5m])
            /
            rate(http_requests_total[5m]) > 0.05
          for: 3m
          labels:
            severity: critical
          annotations:
            summary: "5xx error rate exceeds 5% after deployment"

Slack/DingTalk Alert Notification

# GitHub Actions: Deployment notification
- name: Notify Deployment
  if: always()
  uses: 8398a7/action-slack@v3
  with:
    status: ${{ job.status }}
    fields: repo,message,commit,author,action,eventName,ref,workflow
  env:
    SLACK_WEBHOOK_URL: ${{ secrets.SLACK_WEBHOOK }}

Rollback Strategies

Auto Rollback: On Health Check Failure

# Helm deploy + auto rollback
- name: Deploy with Auto Rollback
  run: |
    helm upgrade --install my-app ./helm/my-app \
      --namespace production \
      --set image.tag=${{ steps.meta.outputs.tags }} \
      --values ./helm/my-app/values-production.yaml \
      --timeout 5m \
      --wait || \
    (echo "Deploy failed, rolling back..." && \
     helm rollback my-app --namespace production && \
     exit 1)

Manual Rollback: Based on Git SHA

# Rollback to specific revision
kubectl rollout undo deployment/my-app --to-revision=3

# Rollback Helm release
helm rollback my-app 2 --namespace production

# GitOps-based rollback: revert Git commit
git revert <commit-hash>
git push origin main
# ArgoCD auto-detects change and executes rollback

Canary Auto Rollback

# Argo Rollouts: Auto analysis + rollback
apiVersion: argoproj.io/v1alpha1
kind: Rollout
metadata:
  name: my-app
spec:
  strategy:
    canary:
      canaryAnalysis:
        templates:
          - templateName: success-rate
            clusterScope: true
        startingStep: 2
        steps:
          - setWeight: 10
          - pause: { duration: 5m }
          - setWeight: 30
          - pause: { duration: 5m }
          - setWeight: 60
          - pause: { duration: 5m }
          - setWeight: 100
        analysisRun:
          successfulRunHistoryLimit: 3
          unsuccessfulRunHistoryLimit: 3

Common Pipeline Failures and Fixes

Failure Symptom	Root Cause	Fix
Docker build cache invalid	Missing `.dockerignore` or wrong COPY order	Optimize Dockerfile instruction order, add `.dockerignore`
Image push 403	Registry auth expired or insufficient permissions	Check Service Account / Token permissions
K8s ImagePullBackOff	Image tag doesn't exist or Registry unreachable	Verify image tag, check Registry network and Secret
Helm deploy timeout	readinessProbe misconfigured or insufficient resources	Adjust probe params, increase resources limits
Staging vs prod inconsistency	Environment config differences	Use Kustomize/Helm for unified management, reduce hardcoding
Security scan false positive	Vulnerability from indirect dependency	Configure `.trivyignore` or Snyk policy to ignore known false positives
Concurrent deploy conflict	Multiple people triggering pipeline simultaneously	Use GitHub Concurrency or GitLab resource_group
Secret leak	Plaintext in YAML or logs	Use Sealed Secrets / External Secrets Operator

Concurrency Control

# GitHub Actions: Prevent concurrent deploy conflicts
concurrency:
  group: deploy-${{ github.ref }}
  cancel-in-progress: true

Debugging Tips

# View Pod events
kubectl describe pod <pod-name> -n <namespace>

# View deployment history
kubectl rollout history deployment/my-app -n production

# View Helm release history
helm history my-app -n production

# Port-forward for debugging
kubectl port-forward svc/my-app 8080:80 -n staging

# View container logs
kubectl logs -f deployment/my-app -n production --all-containers

FAQ

Q: Is the GitHub Actions free tier sufficient? A: Unlimited for public repos, 2000 minutes/month for private repos (Linux). Self-hosted runners have no limits.

Q: Should Docker images use the latest tag? A: Never use latest in production. Use Git SHA or semantic version as immutable tags to ensure deployments are traceable and rollbackable.

Q: How to choose between blue-green and canary? A: Blue-green for critical services needing fast rollback (just switch Service), canary for high-risk changes needing gradual validation. Use rolling update for routine releases.

Q: What's the difference between GitOps and traditional CI/CD Push model? A: Traditional Push: CI pipeline actively kubectl apply. GitOps: in-cluster Agent (ArgoCD) actively pulls Git changes. GitOps advantage: Git is the single source of truth, cluster state drift is auto-repaired.

Q: How to handle Secrets in CI/CD? A: Use platform-native Secret management (GitHub Secrets / GitLab Variables / Jenkins Credentials), use Sealed Secrets or External Secrets Operator in K8s, never commit Secrets to Git.

Q: How to manage multi-cluster deployment? A: Use ArgoCD ApplicationSet + Git directory structure, or Helm + kubeconfig multi-context switching. ArgoCD is recommended as it natively supports multi-cluster.

Q: Pipeline too slow, how to optimize? A: 1) Leverage Docker layer cache and GitHub Actions cache; 2) Run independent Jobs in parallel; 3) Use self-hosted runners to reduce cold start; 4) Incremental testing (only test changed modules).