Terraform IaC最佳實踐:從模組設計到GitOps的5種生產模式

DevOps

2026年,Terraform IaC已經不是「會不會」的問題,而是「做得好不好」

Terraform在2023年更改授權為BSL 1.1後,社群分裂出了OpenTofu。但無論你選擇Terraform還是OpenTofu,HCL依然是IaC領域使用最廣泛的語言。問題不再是「要不要用IaC」,而是「如何把IaC做得生產可用」。

太多團隊的Terraform程式碼是這樣的:一個巨大的main.tf、狀態檔案存在本地、所有環境共享同一套變數、模組沒有版本管理、CI/CD裡手動執行terraform apply。這不是IaC,這是「用程式碼寫的手工運維」。

本文覆蓋5種生產級IaC模式,從模組組合設計到GitOps自動化,幫你把Terraform從「能用」升級到「好用」。

核心收穫

  • 掌握模組組合設計模式:可複用、可測試、可版本化的模組架構
  • 理解遠端狀態管理的3層防護:遠端後端、狀態鎖、Drift偵測
  • 實作Workspace環境隔離和變數管理最佳實踐
  • 完成從Terraform到OpenTofu的無縫遷移
  • 整合GitOps工作流:Atlantis + CI/CD自動化plan/apply

目錄

  • Terraform IaC核心概念
  • Pattern 1: 模組組合設計
  • Pattern 2: 狀態管理
  • Pattern 3: Workspace環境隔離
  • Pattern 4: OpenTofu遷移
  • Pattern 5: GitOps整合
  • 5個常見坑及解決方案
  • 10個常見報錯排查
  • 進階最佳化技巧
  • 對比分析
  • 線上工具推薦

Terraform IaC核心概念

IaC成熟度模型

┌─────────────────────────────────────────────────────────────┐
│                 IaC成熟度模型                                  │
├──────────┬──────────────────┬────────────────────────────────┤
│  Level 1 │  Level 2         │  Level 3                       │
│  腳本化   │  模組化           │  平台化                        │
├──────────┼──────────────────┼────────────────────────────────┤
│ 單檔案    │  模組拆分         │  模組組合+註冊表               │
│ 本地狀態  │  遠端狀態         │  狀態分層+隔離                 │
│ 手動執行  │  CI/CD觸發       │  GitOps自動化                  │
│ 無測試    │  基礎測試         │  Policy as Code               │
│ 無版本    │  Git版本         │  語義化版本+變更日誌           │
│ 環境耦合  │  Workspace隔離   │  多環境抽象層                  │
└──────────┴──────────────────┴────────────────────────────────┘

Terraform核心工作流

┌──────────┐     ┌──────────┐     ┌──────────┐     ┌──────────┐
│  Write   │────▶│  Plan    │────▶│  Apply   │────▶│  State   │
│ (編寫HCL) │     │ (預覽變更)│     │ (執行變更)│     │ (狀態更新)│
└──────────┘     └──────────┘     └──────────┘     └──────────┘
     │                │                │                │
     ▼                ▼                ▼                ▼
  Git Commit    terraform plan   terraform apply   Remote Backend
  Pull Request  Plan File輸出    資源建立/更新     S3/GCS/Cloud

2026年Terraform生態關鍵變化

變化 影響 應對策略
BSL 1.1授權 企業使用受限 評估OpenTofu遷移
OpenTofu 1.9+ 社群驅動替代方案 新專案優先選擇
Terraform 1.10+ 原生測試框架 採用terraform test
Crossplane崛起 K8s原生IaC 互補而非替代
Pulumi成熟 通用語言IaC 按團隊技能選擇

Pattern 1: 模組組合設計

模組是Terraform IaC的基石。但大多數團隊只做到了「拆檔案」,沒有做到「可組合」。生產級模組設計需要3層架構:基礎模組(Base Module)、組合模組(Composition Module)、環境模組(Environment Module)。

三層模組架構

┌──────────────────────────────────────────────────────┐
│              Environment Module (環境模組)              │
│  ┌──────────────────────────────────────────────────┐ │
│  │           Composition Module (組合模組)            │ │
│  │  ┌──────────┐  ┌──────────┐  ┌──────────┐      │ │
│  │  │  Base    │  │  Base    │  │  Base    │      │ │
│  │  │ Module   │  │ Module   │  │ Module   │      │ │
│  │  │ (VPC)    │  │ (RDS)    │  │ (ECS)    │      │ │
│  │  └──────────┘  └──────────┘  └──────────┘      │ │
│  └──────────────────────────────────────────────────┘ │
└──────────────────────────────────────────────────────┘

基礎模組:VPC

modules/
└── vpc/
    ├── main.tf
    ├── variables.tf
    ├── outputs.tf
    ├── versions.tf
    └── README.md
# modules/vpc/versions.tf
terraform {
  required_version = ">= 1.5.0"
  required_providers {
    aws = {
      source  = "hashicorp/aws"
      version = ">= 5.0, < 6.0"
    }
  }
}
# modules/vpc/variables.tf
variable "cidr_block" {
  description = "VPC CIDR block"
  type        = string
  default     = "10.0.0.0/16"
}

variable "environment" {
  description = "Environment name"
  type        = string
}

variable "public_subnets" {
  description = "List of public subnet CIDR blocks"
  type        = list(string)
  default     = ["10.0.1.0/24", "10.0.2.0/24"]
}

variable "private_subnets" {
  description = "List of private subnet CIDR blocks"
  type        = list(string)
  default     = ["10.0.10.0/24", "10.0.11.0/24"]
}

variable "enable_nat_gateway" {
  description = "Enable NAT Gateway for private subnets"
  type        = bool
  default     = true
}

variable "single_nat_gateway" {
  description = "Use single NAT Gateway to reduce cost"
  type        = bool
  default     = false
}

variable "tags" {
  description = "Additional tags for all resources"
  type        = map(string)
  default     = {}
}
# modules/vpc/main.tf
resource "aws_vpc" "this" {
  cidr_block           = var.cidr_block
  enable_dns_hostnames = true
  enable_dns_support   = true

  tags = merge(
    {
      Name        = "${var.environment}-vpc"
      Environment = var.environment
      ManagedBy   = "terraform"
    },
    var.tags
  )
}

resource "aws_internet_gateway" "this" {
  vpc_id = aws_vpc.this.id

  tags = merge(
    {
      Name        = "${var.environment}-igw"
      Environment = var.environment
      ManagedBy   = "terraform"
    },
    var.tags
  )
}

resource "aws_subnet" "public" {
  count                   = length(var.public_subnets)
  vpc_id                  = aws_vpc.this.id
  cidr_block              = var.public_subnets[count.index]
  availability_zone       = data.aws_availability_zones.available.names[count.index]
  map_public_ip_on_launch = true

  tags = merge(
    {
      Name        = "${var.environment}-public-${count.index + 1}"
      Environment = var.environment
      Tier        = "public"
      ManagedBy   = "terraform"
    },
    var.tags
  )
}

resource "aws_subnet" "private" {
  count             = length(var.private_subnets)
  vpc_id            = aws_vpc.this.id
  cidr_block        = var.private_subnets[count.index]
  availability_zone = data.aws_availability_zones.available.names[count.index]

  tags = merge(
    {
      Name        = "${var.environment}-private-${count.index + 1}"
      Environment = var.environment
      Tier        = "private"
      ManagedBy   = "terraform"
    },
    var.tags
  )
}

resource "aws_eip" "nat" {
  count  = var.enable_nat_gateway ? (var.single_nat_gateway ? 1 : length(var.private_subnets)) : 0
  domain = "vpc"

  tags = merge(
    {
      Name        = "${var.environment}-nat-eip-${count.index + 1}"
      Environment = var.environment
      ManagedBy   = "terraform"
    },
    var.tags
  )
}

resource "aws_nat_gateway" "this" {
  count         = var.enable_nat_gateway ? (var.single_nat_gateway ? 1 : length(var.private_subnets)) : 0
  allocation_id = aws_eip.nat[count.index].id
  subnet_id     = aws_subnet.public[count.index % length(aws_subnet.public)].id

  tags = merge(
    {
      Name        = "${var.environment}-nat-${count.index + 1}"
      Environment = var.environment
      ManagedBy   = "terraform"
    },
    var.tags
  )

  depends_on = [aws_internet_gateway.this]
}

resource "aws_route_table" "public" {
  vpc_id = aws_vpc.this.id

  route {
    cidr_block = "0.0.0.0/0"
    gateway_id = aws_internet_gateway.this.id
  }

  tags = merge(
    {
      Name        = "${var.environment}-public-rt"
      Environment = var.environment
      ManagedBy   = "terraform"
    },
    var.tags
  )
}

resource "aws_route_table" "private" {
  count  = length(var.private_subnets)
  vpc_id = aws_vpc.this.id

  route {
    cidr_block     = "0.0.0.0/0"
    nat_gateway_id = var.single_nat_gateway ? aws_nat_gateway.this[0].id : aws_nat_gateway.this[count.index].id
  }

  tags = merge(
    {
      Name        = "${var.environment}-private-rt-${count.index + 1}"
      Environment = var.environment
      ManagedBy   = "terraform"
    },
    var.tags
  )
}

resource "aws_route_table_association" "public" {
  count          = length(var.public_subnets)
  subnet_id      = aws_subnet.public[count.index].id
  route_table_id = aws_route_table.public.id
}

resource "aws_route_table_association" "private" {
  count          = length(var.private_subnets)
  subnet_id      = aws_subnet.private[count.index].id
  route_table_id = aws_route_table.private[count.index].id
}

data "aws_availability_zones" "available" {
  state = "available"
}
# modules/vpc/outputs.tf
output "vpc_id" {
  description = "VPC ID"
  value       = aws_vpc.this.id
}

output "vpc_cidr" {
  description = "VPC CIDR block"
  value       = aws_vpc.this.cidr_block
}

output "public_subnet_ids" {
  description = "List of public subnet IDs"
  value       = aws_subnet.public[*].id
}

output "private_subnet_ids" {
  description = "List of private subnet IDs"
  value       = aws_subnet.private[*].id
}

output "nat_gateway_ids" {
  description = "List of NAT Gateway IDs"
  value       = aws_nat_gateway.this[*].id
}

output "igw_id" {
  description = "Internet Gateway ID"
  value       = aws_internet_gateway.this.id
}

組合模組:完整應用基礎設施

# modules/app-stack/main.tf
module "vpc" {
  source = "../vpc"

  cidr_block         = var.vpc_cidr
  environment        = var.environment
  public_subnets     = var.public_subnet_cidrs
  private_subnets    = var.private_subnet_cidrs
  enable_nat_gateway = true
  single_nat_gateway = var.environment != "prod"
  tags               = local.common_tags
}

module "rds" {
  source = "../rds"

  environment       = var.environment
  vpc_id            = module.vpc.vpc_id
  subnet_ids        = module.vpc.private_subnet_ids
  engine            = var.db_engine
  engine_version    = var.db_engine_version
  instance_class    = var.db_instance_class
  allocated_storage = var.db_allocated_storage
  database_name     = var.database_name
  username          = var.db_username
  password          = var.db_password
  tags              = local.common_tags
}

module "ecs" {
  source = "../ecs"

  environment         = var.environment
  vpc_id              = module.vpc.vpc_id
  subnet_ids          = module.vpc.private_subnet_ids
  cluster_name        = "${var.environment}-cluster"
  container_image     = var.container_image
  container_port      = var.container_port
  desired_count       = var.desired_count
  cpu                 = var.cpu
  memory              = var.memory
  environment_variables = merge(
    {
      DATABASE_URL = "postgresql://${var.db_username}:${var.db_password}@${module.rds.endpoint}/${var.database_name}"
      ENVIRONMENT  = var.environment
    },
    var.extra_environment_variables
  )
  tags = local.common_tags
}

locals {
  common_tags = merge(
    {
      Environment = var.environment
      Project     = var.project_name
      ManagedBy   = "terraform"
    },
    var.tags
  )
}

模組版本管理

# 使用Terraform Registry模組
module "vpc" {
  source  = "terraform-aws-modules/vpc/aws"
  version = "~> 5.0"
}

# 使用Git倉庫模組(帶標籤)
module "app_stack" {
  source = "git::https://github.com/myorg/terraform-modules.git//modules/app-stack?ref=v2.1.0"
}

# 使用本地模組(開發階段)
module "vpc" {
  source = "../../modules/vpc"
}

# 使用S3儲存的模組包
module "app_stack" {
  source = "s3::https://my-terraform-modules.s3.amazonaws.com/app-stack/v2.1.0.zip"
}

模組測試

# modules/vpc/tests/main.tftest.hcl
run "validate_vpc_cidr" {
  command = plan

  variables {
    cidr_block  = "10.0.0.0/16"
    environment = "test"
  }

  assert {
    condition     = aws_vpc.this.cidr_block == "10.0.0.0/16"
    error_message = "VPC CIDR block should match input"
  }
}

run "validate_subnets" {
  command = plan

  variables {
    cidr_block      = "10.0.0.0/16"
    environment     = "test"
    public_subnets  = ["10.0.1.0/24", "10.0.2.0/24"]
    private_subnets = ["10.0.10.0/24", "10.0.11.0/24"]
  }

  assert {
    condition     = length(aws_subnet.public) == 2
    error_message = "Should create 2 public subnets"
  }

  assert {
    condition     = length(aws_subnet.private) == 2
    error_message = "Should create 2 private subnets"
  }
}

run "validate_nat_gateway_production" {
  command = plan

  variables {
    cidr_block         = "10.0.0.0/16"
    environment        = "prod"
    enable_nat_gateway = true
    single_nat_gateway = false
    private_subnets    = ["10.0.10.0/24", "10.0.11.0/24", "10.0.12.0/24"]
  }

  assert {
    condition     = length(aws_nat_gateway.this) == 3
    error_message = "Production should have one NAT Gateway per AZ"
  }
}
# 執行模組測試
cd modules/vpc
terraform test

# 執行所有模組測試
terraform test -recursive

Pattern 2: 狀態管理

Terraform狀態檔案是IaC最關鍵的資料。遺失狀態等於遺失對基礎設施的控制。生產環境必須使用遠端後端、啟用狀態鎖、定期偵測Drift。

遠端後端架構

┌──────────────────────────────────────────────────────┐
│                  狀態管理三層防護                        │
│                                                       │
│  ┌──────────────────────────────────────────────┐    │
│  │  Layer 1: 遠端後端                             │    │
│  │  S3 + DynamoDB / GCS / Azure Blob             │    │
│  │  (狀態持久化,團隊共享)                          │    │
│  └──────────────────────────────────────────────┘    │
│                                                       │
│  ┌──────────────────────────────────────────────┐    │
│  │  Layer 2: 狀態鎖                               │    │
│  │  DynamoDB / GCS原生鎖 / Azure Blob租約         │    │
│  │  (防止並行修改,序列化apply)                    │    │
│  └──────────────────────────────────────────────┘    │
│                                                       │
│  ┌──────────────────────────────────────────────┐    │
│  │  Layer 3: Drift偵測                            │    │
│  │  terraform refresh + CI/CD定時檢查             │    │
│  │  (發現手動變更,保持狀態一致)                    │    │
│  └──────────────────────────────────────────────┘    │
└──────────────────────────────────────────────────────┘

S3 + DynamoDB後端設定

# backend.tf
terraform {
  backend "s3" {
    bucket         = "myorg-terraform-state"
    key            = "app-infra/terraform.tfstate"
    region         = "us-east-1"
    encrypt        = true
    dynamodb_table = "terraform-locks"
    kms_key_id     = "arn:aws:kms:us-east-1:123456789012:key/abc123"

    state_lock_timeout = "30m"
  }
}

引導後端基礎設施

# bootstrap/main.tf
resource "aws_s3_bucket" "terraform_state" {
  bucket = "myorg-terraform-state"

  lifecycle {
    prevent_destroy = true
  }
}

resource "aws_s3_bucket_versioning" "terraform_state" {
  bucket = aws_s3_bucket.terraform_state.id

  versioning_configuration {
    status = "Enabled"
  }
}

resource "aws_s3_bucket_server_side_encryption_configuration" "terraform_state" {
  bucket = aws_s3_bucket.terraform_state.id

  rule {
    apply_server_side_encryption_by_default {
      sse_algorithm     = "aws:kms"
      kms_master_key_id = aws_kms_key.terraform_state.arn
    }
  }
}

resource "aws_s3_bucket_public_access_block" "terraform_state" {
  bucket = aws_s3_bucket.terraform_state.id

  block_public_acls       = true
  block_public_policy     = true
  ignore_public_acls      = true
  restrict_public_buckets = true
}

resource "aws_s3_bucket_lifecycle_configuration" "terraform_state" {
  bucket = aws_s3_bucket.terraform_state.id

  rule {
    id     = "cleanup-old-versions"
    status = "Enabled"

    noncurrent_version_transition {
      noncurrent_days = 90
      storage_class   = "GLACIER"
    }

    noncurrent_version_expiration {
      noncurrent_days = 365
    }
  }
}

resource "aws_dynamodb_table" "terraform_locks" {
  name         = "terraform-locks"
  billing_mode = "PAY_PER_REQUEST"
  hash_key     = "LockID"

  attribute {
    name = "LockID"
    type = "S"
  }
}

resource "aws_kms_key" "terraform_state" {
  description             = "Terraform state encryption key"
  deletion_window_in_days = 30
  enable_key_rotation     = true
}

resource "aws_kms_alias" "terraform_state" {
  name          = "alias/terraform-state"
  target_key_id = aws_kms_key.terraform_state.key_id
}
# 引導流程
cd bootstrap
terraform init
terraform apply

# 遷移到遠端後端
# 建立backend.tf後執行
terraform init -migrate-state

# 驗證狀態已遷移到S3
aws s3 ls s3://myorg-terraform-state/app-infra/

狀態分層

┌──────────────────────────────────────────────────────┐
│                  狀態分層架構                           │
│                                                       │
│  ┌──────────────────────────────────────────────┐    │
│  │  Layer 0: bootstrap (本地狀態)                 │    │
│  │  S3 Bucket / DynamoDB / KMS / IAM             │    │
│  └──────────────────────────────────────────────┘    │
│                     │ 資料流                          │
│  ┌──────────────────────────────────────────────┐    │
│  │  Layer 1: networking (遠端狀態)                │    │
│  │  VPC / Subnets / Route Tables / NAT GW        │    │
│  └──────────────────────────────────────────────┘    │
│                     │ 資料流                          │
│  ┌──────────────────────────────────────────────┐    │
│  │  Layer 2: compute (遠端狀態)                   │    │
│  │  ECS / EKS / RDS / ElastiCache                │    │
│  └──────────────────────────────────────────────┘    │
│                     │ 資料流                          │
│  ┌──────────────────────────────────────────────┐    │
│  │  Layer 3: services (遠端狀態)                  │    │
│  │  DNS / CDN / Monitoring / Alerts              │    │
│  └──────────────────────────────────────────────┘    │
└──────────────────────────────────────────────────────┘
# Layer 2引用Layer 1的輸出
data "terraform_remote_state" "networking" {
  backend = "s3"

  config = {
    bucket = "myorg-terraform-state"
    key    = "networking/terraform.tfstate"
    region = "us-east-1"
  }
}

module "ecs" {
  source = "../../modules/ecs"

  vpc_id     = data.terraform_remote_state.networking.outputs.vpc_id
  subnet_ids = data.terraform_remote_state.networking.outputs.private_subnet_ids
}

Drift偵測

# 手動Drift偵測
terraform plan -detailed-exitcode
# 0 = 無變更
# 1 = 錯誤
# 2 = 有變更(Drift存在)

# 重新整理狀態
terraform refresh
# .github/workflows/drift-detection.yml
name: Drift Detection
on:
  schedule:
    - cron: "0 8 * * 1-5"
  workflow_dispatch:

jobs:
  drift-detection:
    runs-on: ubuntu-latest
    steps:
      - uses: actions/checkout@v4

      - name: Setup Terraform
        uses: hashicorp/setup-terraform@v3
        with:
          terraform_version: "1.10.0"

      - name: Terraform Init
        run: terraform init -backend-config=backend.hcl

      - name: Check Drift
        run: |
          terraform plan -detailed-exitcode -out=plan.out || exit_code=$?
          if [ "${exit_code:-0}" -eq 2 ]; then
            echo "::warning::Infrastructure drift detected!"
            terraform show -json plan.out | jq -r '.resource_changes[] | select(.change.actions != ["no-op"]) | "\(.type).\(.name): \(.change.actions | join(", "))"'
            exit 1
          fi

Pattern 3: Workspace環境隔離

多環境管理是IaC最基本的需求。dev/staging/prod三套環境,共享模組但設定不同。Terraform Workspace提供了輕量級的環境隔離方案,但需要配合變數管理才能用好。

Workspace vs 目錄隔離

┌─────────────────────────────────────────────────────────────┐
│              環境隔離兩種方案對比                                │
├──────────────────────┬──────────────────────────────────────┤
│  Workspace隔離        │  目錄隔離                              │
├──────────────────────┼──────────────────────────────────────┤
│  單個State檔案/環境   │  每個環境獨立State檔案                  │
│  同一份程式碼         │  每個環境獨立程式碼目錄                  │
│  workspace切換       │  目錄切換                              │
│  適合簡單環境         │  適合複雜環境                          │
│  狀態檔案路徑:        │  狀態檔案路徑:                         │
│  env:/dev/state      │  dev/terraform.tfstate               │
│  env:/prod/state     │  prod/terraform.tfstate              │
└──────────────────────┴──────────────────────────────────────┘

推薦方案:目錄隔離 + 共享模組

infra/
├── modules/
│   ├── vpc/
│   ├── rds/
│   └── ecs/
├── environments/
│   ├── dev/
│   │   ├── main.tf
│   │   ├── variables.tf
│   │   ├── outputs.tf
│   │   ├── backend.tf
│   │   └── terraform.tfvars
│   ├── staging/
│   │   ├── main.tf
│   │   ├── variables.tf
│   │   ├── outputs.tf
│   │   ├── backend.tf
│   │   └── terraform.tfvars
│   └── prod/
│       ├── main.tf
│       ├── variables.tf
│       ├── outputs.tf
│       ├── backend.tf
│       └── terraform.tfvars
└── shared/
    └── locals.tf

環境設定

# environments/dev/backend.tf
terraform {
  backend "s3" {
    bucket         = "myorg-terraform-state"
    key            = "dev/app-infra/terraform.tfstate"
    region         = "us-east-1"
    encrypt        = true
    dynamodb_table = "terraform-locks"
  }
}
# environments/dev/main.tf
module "vpc" {
  source = "../../modules/vpc"

  cidr_block         = "10.0.0.0/16"
  environment        = "dev"
  public_subnets     = ["10.0.1.0/24"]
  private_subnets    = ["10.0.10.0/24"]
  enable_nat_gateway = true
  single_nat_gateway = true
}

module "rds" {
  source = "../../modules/rds"

  environment       = "dev"
  vpc_id            = module.vpc.vpc_id
  subnet_ids        = module.vpc.private_subnet_ids
  engine            = "postgres"
  engine_version    = "16.4"
  instance_class    = "db.t3.micro"
  allocated_storage = 20
  database_name     = "appdb_dev"
  username          = "appadmin"
  password          = var.db_password
}

module "ecs" {
  source = "../../modules/ecs"

  environment     = "dev"
  vpc_id          = module.vpc.vpc_id
  subnet_ids      = module.vpc.private_subnet_ids
  container_image = "123456789012.dkr.ecr.us-east-1.amazonaws.com/app:dev-latest"
  desired_count   = 1
  cpu             = 256
  memory          = 512
}
# environments/prod/main.tf
module "vpc" {
  source = "../../modules/vpc"

  cidr_block         = "10.100.0.0/16"
  environment        = "prod"
  public_subnets     = ["10.100.1.0/24", "10.100.2.0/24", "10.100.3.0/24"]
  private_subnets    = ["10.100.10.0/24", "10.100.11.0/24", "10.100.12.0/24"]
  enable_nat_gateway = true
  single_nat_gateway = false
}

module "rds" {
  source = "../../modules/rds"

  environment       = "prod"
  vpc_id            = module.vpc.vpc_id
  subnet_ids        = module.vpc.private_subnet_ids
  engine            = "postgres"
  engine_version    = "16.4"
  instance_class    = "db.r6g.xlarge"
  allocated_storage = 500
  database_name     = "appdb_prod"
  username          = "appadmin"
  password          = var.db_password
  multi_az          = true
}

module "ecs" {
  source = "../../modules/ecs"

  environment     = "prod"
  vpc_id          = module.vpc.vpc_id
  subnet_ids      = module.vpc.private_subnet_ids
  container_image = "123456789012.dkr.ecr.us-east-1.amazonaws.com/app:v2.1.0"
  desired_count   = 3
  cpu             = 1024
  memory          = 2048
}

變數驗證

# variables.tf
variable "environment" {
  description = "Environment name"
  type        = string

  validation {
    condition     = contains(["dev", "staging", "prod"], var.environment)
    error_message = "Environment must be one of: dev, staging, prod."
  }
}

variable "db_password" {
  description = "Database password"
  type        = string
  sensitive   = true

  validation {
    condition     = length(var.db_password) >= 16
    error_message = "Database password must be at least 16 characters."
  }
}

Pattern 4: OpenTofu遷移

2023年8月,HashiCorp將Terraform授權從Mozilla Public License 2.0更改為Business Source License 1.1。BSL 1.1禁止「競爭性使用」——包括使用Terraform提供競爭性IaC服務。這對大多數企業不構成問題,但開源社群的反應催生了OpenTofu。

遷移評估

評估維度 Terraform (BSL 1.1) OpenTofu (MPL 2.0)
授權 BSL 1.1(競爭性使用限制) MPL 2.0(完全開源)
CLI相容性 原生 100%相容Terraform 1.6
Provider相容 原生 相容所有社群Provider
狀態檔案 原生格式 100%相容
企業支援 HashiCorp支援 Linux基金會社群
新特性 1.10+原生測試 1.9+加密狀態
註冊表 Terraform Registry OpenTofu Registry

遷移步驟

# Step 1: 安裝OpenTofu
# macOS
brew install opentofu

# Linux
curl -fsSL https://get.opentofu.org/install-opentofu.sh | bash

# 驗證版本相容性
tofu version
# OpenTofu v1.9.0

# Step 2: 替換CLI命令
# terraform init → tofu init
# terraform plan → tofu plan
# terraform apply → tofu apply
# terraform destroy → tofu destroy

# Step 3: 驗證相容性
tofu init
tofu plan
# 如果plan輸出與terraform plan一致,遷移成功

# Step 4: 更新CI/CD設定
# 將所有terraform命令替換為tofu

OpenTofu獨有特性:加密狀態

# OpenTofu 1.9+支援原生狀態加密
terraform {
  encryption {
    key_provider "pbkdf2" "mykey" {
      passphrase = var.encryption_passphrase
      key_length = 32
      iterations = 600000
      salt       = "fixed-salt-for-key-derivation"
    }

    method "aes_gcm" "myencryption" {
      keys = key_provider.pbkdf2.mykey
    }

    state {
      method = method.aes_gcm.myencryption
      fallback {
        method = method.aes_gcm.myencryption
      }
    }

    plan {
      method = method.aes_gcm.myencryption
      fallback {
        method = method.aes_gcm.myencryption
      }
    }
  }
}

漸進式遷移策略

┌──────────────────────────────────────────────────────┐
│                  漸進式遷移路線圖                       │
│                                                       │
│  Phase 1: 評估(1-2週)                               │
│  ├── 盤點所有Terraform專案                             │
│  ├── 檢查Provider和Module相容性                        │
│  └── 制定遷移優先順序                                  │
│                                                       │
│  Phase 2: 非生產環境遷移(2-4週)                      │
│  ├── dev/staging環境切換到OpenTofu                     │
│  ├── 驗證plan輸出一致性                               │
│  └── 更新CI/CD Pipeline                              │
│                                                       │
│  Phase 3: 生產環境遷移(1-2週)                        │
│  ├── 生產環境切換到OpenTofu                            │
│  ├── 啟用狀態加密                                     │
│  └── 監控執行1週確認穩定                               │
│                                                       │
│  Phase 4: 清理(1週)                                  │
│  ├── 移除Terraform CLI依賴                            │
│  ├── 更新文件和Runbook                                │
│  └── 團隊培訓完成                                     │
└──────────────────────────────────────────────────────┘

Pattern 5: GitOps整合

Terraform IaC的最終目標是GitOps:程式碼提交即觸發plan,審批後自動apply。Atlantis是目前最成熟的Terraform GitOps工具,在Pull Request中直接執行terraform planterraform apply

Atlantis架構

┌──────────────────────────────────────────────────────┐
│                  Atlantis GitOps架構                   │
│                                                       │
│  ┌──────────┐     webhook     ┌──────────────────┐  │
│  │  GitHub   │───────────────▶│    Atlantis       │  │
│  │  /GitLab  │                │    Server         │  │
│  │          │◀───────────────│                    │  │
│  │          │  PR Comment     │  ┌──────────────┐│  │
│  │          │  (plan/apply)   │  │ terraform    ││  │
│  └──────────┘                │  │ plan/apply   ││  │
│                               │  └──────────────┘│  │
│                               └────────┬─────────┘  │
│                                        │             │
│                               ┌────────▼─────────┐  │
│                               │  AWS / GCP /     │  │
│                               │  Azure API       │  │
│                               └──────────────────┘  │
└──────────────────────────────────────────────────────┘

部署Atlantis

# atlantis-deployment.yaml
apiVersion: apps/v1
kind: Deployment
metadata:
  name: atlantis
  namespace: atlantis
spec:
  replicas: 1
  selector:
    matchLabels:
      app: atlantis
  template:
    metadata:
      labels:
        app: atlantis
    spec:
      containers:
        - name: atlantis
          image: ghcr.io/runatlantis/atlantis:v0.30.0
          ports:
            - containerPort: 4141
          env:
            - name: ATLANTIS_GH_USER
              value: "myorg-bot"
            - name: ATLANTIS_GH_TOKEN
              valueFrom:
                secretKeyRef:
                  name: atlantis-secrets
                  key: github-token
            - name: ATLANTIS_GH_WEBHOOK_SECRET
              valueFrom:
                secretKeyRef:
                  name: atlantis-secrets
                  key: webhook-secret
            - name: ATLANTIS_ALLOW_REPO_CONFIG
              value: "true"
            - name: ATLANTIS_PARALLEL_PLAN_COUNT
              value: "4"
            - name: ATLANTIS_PARALLEL_APPLY_COUNT
              value: "2"
            - name: ATLANTIS_AUTOPLAN_ENABLED
              value: "true"
            - name: ATLANTIS_REPO_CONFIG_JSON
              value: |
                {
                  "repos": [
                    {
                      "id": "/.*/",
                      "apply_requirements": ["approved", "mergeable"],
                      "plan_requirements": [],
                      "import_requirements": [],
                      "allowed_overrides": ["apply_requirements", "workflow"],
                      "allow_custom_workflows": true
                    }
                  ]
                }
          resources:
            requests:
              cpu: "500m"
              memory: "1Gi"
            limits:
              cpu: "2"
              memory: "4Gi"
          volumeMounts:
            - name: atlantis-data
              mountPath: /home/atlantis
            - name: repo-config
              mountPath: /etc/atlantis
      volumes:
        - name: atlantis-data
          persistentVolumeClaim:
            claimName: atlantis-data
        - name: repo-config
          configMap:
            name: atlantis-config

Atlantis倉庫設定

# atlantis.yaml(專案根目錄)
version: 3
projects:
  - name: dev-infra
    dir: environments/dev
    workflow: terraform
    autoplan:
      when_modified: ["../../modules/**/*.tf", "*.tf", "*.tfvars"]
      enabled: true
    apply_requirements:
      - approved
      - mergeable

  - name: staging-infra
    dir: environments/staging
    workflow: terraform
    autoplan:
      when_modified: ["../../modules/**/*.tf", "*.tf", "*.tfvars"]
      enabled: true
    apply_requirements:
      - approved
      - mergeable

  - name: prod-infra
    dir: environments/prod
    workflow: terraform
    autoplan:
      when_modified: ["../../modules/**/*.tf", "*.tf", "*.tfvars"]
      enabled: true
    apply_requirements:
      - approved
      - mergeable
      - undiverged

workflows:
  terraform:
    plan:
      steps:
        - env:
            name: TF_VAR_db_password
            value: ${DB_PASSWORD}
        - run: terraform init -backend-config=backend.hcl -reconfigure
        - run: terraform plan -out=$PLANFILE -var-file=terraform.tfvars
        - run: terraform show -json $PLANFILE > $SHOWFILE
    apply:
      steps:
        - run: terraform apply $PLANFILE

Policy as Code:Sentinel / OPA

# sentinel/require-tags.sentinel
import "tfplan/v2" as tfplan

allResources = filter tfplan.resource_changes as _, rc {
    rc.mode == "managed" and
    rc.type != "null_resource" and
    rc.change.actions != ["delete"]
}

tagsRequired = rule {
    all allResources as _, rc {
        rc.change.after.tags contains "Environment" and
        rc.change.after.tags contains "ManagedBy"
    }
}

main = rule {
    tagsRequired
}
# opa/require-tags.rego
package terraform

import future.keywords.if
import future.keywords.in

deny[msg] if {
    some rc in input.resource_changes
    rc.mode == "managed"
    rc.change.actions[_] != "delete"
    not "Environment" in object.keys(rc.change.after.tags)
    msg := sprintf("Resource %s of type %s missing Environment tag", [rc.name, rc.type])
}

deny[msg] if {
    some rc in input.resource_changes
    rc.mode == "managed"
    rc.change.actions[_] != "delete"
    not "ManagedBy" in object.keys(rc.change.after.tags)
    msg := sprintf("Resource %s of type %s missing ManagedBy tag", [rc.name, rc.type])
}
# 使用OPA檢查Terraform Plan
terraform plan -out=plan.out
terraform show -json plan.out > plan.json

# 執行OPA策略檢查
opa eval --data opa/ --input plan.json "data.terraform.deny"

5個常見坑及解決方案

坑1: 狀態檔案損壞

現象terraform plan報錯state snapshot was created by a newer versioninvalid character

原因:狀態檔案被手動編輯、磁碟故障、S3版本回退導致。

解決方案

# 1. 從S3版本歷史恢復
aws s3api list-object-versions \
  --bucket myorg-terraform-state \
  --prefix dev/app-infra/terraform.tfstate

# 恢復到上一個版本
aws s3api copy-object \
  --bucket myorg-terraform-state \
  --copy-source myorg-terraform-state/dev/app-infra/terraform.tfstate?versionId=PREVIOUS_VERSION \
  --key dev/app-infra/terraform.tfstate

# 2. 強制拉取並修復
terraform state pull > state.json
# 手動修復JSON(謹慎操作)
terraform state push state.json

坑2: Provider版本鎖定導致CI失敗

現象:本地terraform plan正常,CI/CD中報Provider下載失敗或版本不相容。

原因:本地有快取,CI環境每次全新安裝。Provider版本未鎖定。

解決方案

# versions.tf - 鎖定Provider版本
terraform {
  required_version = ">= 1.5.0, < 2.0.0"
  required_providers {
    aws = {
      source  = "hashicorp/aws"
      version = "5.80.0"
    }
  }
}
# 提交鎖定檔案到Git
git add .terraform.lock.hcl
git commit -m "chore: lock provider versions"

坑3: 迴圈依賴

現象terraform plan報錯Cycle: module.x, module.y

原因:模組A的輸出依賴模組B的輸出,模組B又依賴模組A的輸出。

解決方案

# 錯誤:迴圈依賴
module "vpc" {
  source = "./vpc"
  ecs_security_group_id = module.ecs.security_group_id
}

module "ecs" {
  source = "./ecs"
  subnet_ids = module.vpc.private_subnet_ids
}

# 正確:拆分為3層,單向依賴
# Layer 1: 網路
module "vpc" {
  source = "./vpc"
}

# Layer 2: 安全群組
module "security_groups" {
  source   = "./security-groups"
  vpc_id   = module.vpc.vpc_id
}

# Layer 3: 運算
module "ecs" {
  source             = "./ecs"
  subnet_ids         = module.vpc.private_subnet_ids
  security_group_ids = module.security_groups.app_ids
}

坑4: 敏感變數洩露到狀態檔案

現象terraform show能看到明文密碼,狀態檔案中包含敏感資訊。

原因sensitive = true只隱藏CLI輸出,不加密狀態檔案中的值。

解決方案

# 方案1: 使用AWS SSM Parameter Store
data "aws_ssm_parameter" "db_password" {
  name            = "/app/${var.environment}/db-password"
  with_decryption = true
}

module "rds" {
  source   = "../../modules/rds"
  password = data.aws_ssm_parameter.db_password.value
}

# 方案2: 使用AWS Secrets Manager
data "aws_secretsmanager_secret_version" "db_creds" {
  secret_id = "app/${var.environment}/db-credentials"
}

module "rds" {
  source   = "../../modules/rds"
  password = jsondecode(data.aws_secretsmanager_secret_version.db_creds.secret_string)["password"]
}

坑5: 模組重構導致資源重建

現象:修改模組名稱或移動資源後,terraform plan顯示要刪除重建資源。

原因:Terraform透過資源位址(module.vpc.aws_vpc.this)識別資源。位址變了,Terraform認為是新資源。

解決方案

# 重構前:先移動狀態
terraform state mv module.vpc module.networking.vpc
terraform state mv module.vpc.aws_vpc.this module.networking.aws_vpc.this

# 然後修改程式碼
# 移動模組檔案
mv modules/vpc modules/networking/vpc

# 更新模組引用
# module "vpc" → module "networking_vpc"

# 驗證
terraform plan  # 應該顯示no changes

10個常見報錯排查

1. Error: Failed to load plugin

# 清除外掛快取重新下載
rm -rf .terraform/providers
terraform init -upgrade

# 檢查網路代理
export HTTPS_PROXY=http://proxy.internal:8080
terraform init

2. Error: Error locking state: Error acquiring the state lock

# 檢視DynamoDB中的鎖
aws dynamodb scan --table-name terraform-locks

# 確認沒有其他程序在執行
# 如果確認鎖是殘留的,強制解鎖
terraform force-unlock <lock-id>

3. Error: Provider produced inconsistent result after apply

# 這是Provider的Bug,通常可以透過以下方式解決
# 1. 更新Provider版本
terraform init -upgrade

# 2. 如果是已知的Provider Bug,使用lifecycle忽略變化
resource "aws_instance" "app" {
  lifecycle {
    ignore_changes = [user_data_replace_on_change]
  }
}

4. Error: Resource already managed by Terraform

# 資源已存在於狀態中,但程式碼中已刪除
# 檢視狀態中的資源
terraform state list

# 從狀態中移除
terraform state rm aws_instance.old_resource

5. Error: Module not found

# 清除模組快取
rm -rf .terraform/modules
terraform init -upgrade

# 檢查模組來源路徑
# 本地模組路徑是相對於目前tf檔案的路徑
module "vpc" {
  source = "../../modules/vpc"
}

6. Error: Invalid for_each argument

# 錯誤:for_each的值在plan時不可知
resource "aws_subnet" "private" {
  for_each = toset(module.vpc.private_subnet_cidrs)
}

# 正確:使用已知值
variable "private_subnets" {
  type = list(string)
}

resource "aws_subnet" "private" {
  for_each = toset(var.private_subnets)
}

7. Error: Value for unconfigurable attribute

# 錯誤:試圖設定唯讀屬性
resource "aws_eip" "nat" {
  instance = aws_instance.nat.id
  domain   = "vpc"
}

# 正確:檢查Provider文件,只設定可寫屬性
resource "aws_eip" "nat" {
  domain = "vpc"
}

8. Error: Backend configuration changed

# 後端設定變更後需要重新初始化
terraform init -migrate-state

# 如果遷移失敗,手動遷移
terraform state pull > state.json
# 修改backend.tf
terraform init
terraform state push state.json

9. Error: Invalid terraform configuration: No required_providers

# 每個模組都需要宣告required_providers
terraform {
  required_providers {
    aws = {
      source  = "hashicorp/aws"
      version = ">= 5.0"
    }
  }
}

10. Error: Incompatible API version

# Provider版本與Terraform版本不相容
# 檢查相容性
terraform version
terraform providers

# 更新到相容版本
terraform {
  required_version = ">= 1.5.0"
  required_providers {
    aws = {
      source  = "hashicorp/aws"
      version = "5.80.0"
    }
  }
}

進階最佳化技巧

1. Terraform Cloud / Enterprise

# terraform-cloud.tf
terraform {
  cloud {
    organization = "myorg"
    workspaces {
      tags = ["app-infra"]
    }
  }
}

2. Terraform測試框架

# tests/integration/main.tftest.hcl
run "create_infrastructure" {
  command = apply

  module {
    source = "../../environments/dev"
  }

  variables {
    db_password = "test-password-12345678"
  }
}

run "validate_endpoints" {
  command = apply

  variables {
    api_endpoint = run.create_infrastructure.api_url
  }

  assert {
    condition     = can(http_request.check.status_code == 200)
    error_message = "API endpoint should return 200"
  }
}

3. Infracost成本估算

# 使用Infracost估算成本
infracost breakdown --path=plan.json \
  --format=json \
  --out-file=infracost.json

# 在PR中新增成本評論
infracost comment github --path=infracost.json \
  --behavior=update

4. 模組文件自動產生

# 安裝terraform-docs
brew install terraform-docs

# 產生README
terraform-docs markdown table ./modules/vpc > ./modules/vpc/README.md
# .pre-commit-config.yaml
repos:
  - repo: https://github.com/antonbabenko/pre-commit-terraform
    rev: v1.92.0
    hooks:
      - id: terraform_fmt
      - id: terraform_validate
      - id: terraform_docs
        args:
          - '--args=--lockfile=false'
      - id: terraform_tflint
      - id: terraform_trivy
      - id: terraform_checkov

對比分析:Terraform vs OpenTofu vs Pulumi vs CDK

維度 Terraform OpenTofu Pulumi CDK
語言 HCL HCL TS/Python/Go/C# TS/Python/Java/C#
授權 BSL 1.1 MPL 2.0 Apache 2.0 Apache 2.0
Provider 3000+ 3000+ 200+ AWS為主
狀態加密 S3 KMS 原生加密 Pulumi Cloud CDK Cloud
測試 terraform test terraform test Mocha/Jest Jest
GitOps Atlantis Atlantis Pulumi Cloud CDK Pipelines
學習曲線
多雲 原生 原生 原生 AWS為主
社群 最大 成長中 成長中 AWS生態
企業支援 HashiCorp Linux基金會 Pulumi Corp AWS

選型決策樹

團隊是否熟悉HCL?
├── 是 → 是否關注授權合規?
│        ├── 是 → OpenTofu
│        └── 否 → Terraform
└── 否 → 是否主要使用AWS?
         ├── 是 → CDK
         └── 否 → 是否偏好通用程式語言?
                  ├── 是 → Pulumi
                  └── 否 → Terraform/OpenTofu(HCL簡單易學)

線上工具推薦


總結:Terraform IaC最佳實踐的核心是5個生產模式——模組組合設計讓程式碼可複用可測試,遠端狀態管理確保資料安全可靠,Workspace環境隔離實作多環境管理,OpenTofu遷移解決授權合規,GitOps整合實作自動化plan/apply。2026年,無論選擇Terraform還是OpenTofu,HCL依然是IaC領域最成熟的選擇。關鍵實踐:三層模組架構、S3+DynamoDB遠端後端、目錄隔離環境、Atlantis GitOps、Policy as Code。IaC不是一次性工程,而是持續演進的平台。


相關文章

外部參考

本站提供瀏覽器本地工具,免註冊即可試用 →

#Terraform#IaC#基础设施即代码#OpenTofu#GitOps#2026#DevOps