Terraform IaC最佳實踐:從模組設計到GitOps的5種生產模式
2026年,Terraform IaC已經不是「會不會」的問題,而是「做得好不好」
Terraform在2023年更改授權為BSL 1.1後,社群分裂出了OpenTofu。但無論你選擇Terraform還是OpenTofu,HCL依然是IaC領域使用最廣泛的語言。問題不再是「要不要用IaC」,而是「如何把IaC做得生產可用」。
太多團隊的Terraform程式碼是這樣的:一個巨大的main.tf、狀態檔案存在本地、所有環境共享同一套變數、模組沒有版本管理、CI/CD裡手動執行terraform apply。這不是IaC,這是「用程式碼寫的手工運維」。
本文覆蓋5種生產級IaC模式,從模組組合設計到GitOps自動化,幫你把Terraform從「能用」升級到「好用」。
核心收穫
- 掌握模組組合設計模式:可複用、可測試、可版本化的模組架構
- 理解遠端狀態管理的3層防護:遠端後端、狀態鎖、Drift偵測
- 實作Workspace環境隔離和變數管理最佳實踐
- 完成從Terraform到OpenTofu的無縫遷移
- 整合GitOps工作流:Atlantis + CI/CD自動化plan/apply
目錄
- Terraform IaC核心概念
- Pattern 1: 模組組合設計
- Pattern 2: 狀態管理
- Pattern 3: Workspace環境隔離
- Pattern 4: OpenTofu遷移
- Pattern 5: GitOps整合
- 5個常見坑及解決方案
- 10個常見報錯排查
- 進階最佳化技巧
- 對比分析
- 線上工具推薦
Terraform IaC核心概念
IaC成熟度模型
┌─────────────────────────────────────────────────────────────┐
│ IaC成熟度模型 │
├──────────┬──────────────────┬────────────────────────────────┤
│ Level 1 │ Level 2 │ Level 3 │
│ 腳本化 │ 模組化 │ 平台化 │
├──────────┼──────────────────┼────────────────────────────────┤
│ 單檔案 │ 模組拆分 │ 模組組合+註冊表 │
│ 本地狀態 │ 遠端狀態 │ 狀態分層+隔離 │
│ 手動執行 │ CI/CD觸發 │ GitOps自動化 │
│ 無測試 │ 基礎測試 │ Policy as Code │
│ 無版本 │ Git版本 │ 語義化版本+變更日誌 │
│ 環境耦合 │ Workspace隔離 │ 多環境抽象層 │
└──────────┴──────────────────┴────────────────────────────────┘
Terraform核心工作流
┌──────────┐ ┌──────────┐ ┌──────────┐ ┌──────────┐
│ Write │────▶│ Plan │────▶│ Apply │────▶│ State │
│ (編寫HCL) │ │ (預覽變更)│ │ (執行變更)│ │ (狀態更新)│
└──────────┘ └──────────┘ └──────────┘ └──────────┘
│ │ │ │
▼ ▼ ▼ ▼
Git Commit terraform plan terraform apply Remote Backend
Pull Request Plan File輸出 資源建立/更新 S3/GCS/Cloud
2026年Terraform生態關鍵變化
| 變化 | 影響 | 應對策略 |
|---|---|---|
| BSL 1.1授權 | 企業使用受限 | 評估OpenTofu遷移 |
| OpenTofu 1.9+ | 社群驅動替代方案 | 新專案優先選擇 |
| Terraform 1.10+ | 原生測試框架 | 採用terraform test |
| Crossplane崛起 | K8s原生IaC | 互補而非替代 |
| Pulumi成熟 | 通用語言IaC | 按團隊技能選擇 |
Pattern 1: 模組組合設計
模組是Terraform IaC的基石。但大多數團隊只做到了「拆檔案」,沒有做到「可組合」。生產級模組設計需要3層架構:基礎模組(Base Module)、組合模組(Composition Module)、環境模組(Environment Module)。
三層模組架構
┌──────────────────────────────────────────────────────┐
│ Environment Module (環境模組) │
│ ┌──────────────────────────────────────────────────┐ │
│ │ Composition Module (組合模組) │ │
│ │ ┌──────────┐ ┌──────────┐ ┌──────────┐ │ │
│ │ │ Base │ │ Base │ │ Base │ │ │
│ │ │ Module │ │ Module │ │ Module │ │ │
│ │ │ (VPC) │ │ (RDS) │ │ (ECS) │ │ │
│ │ └──────────┘ └──────────┘ └──────────┘ │ │
│ └──────────────────────────────────────────────────┘ │
└──────────────────────────────────────────────────────┘
基礎模組:VPC
modules/
└── vpc/
├── main.tf
├── variables.tf
├── outputs.tf
├── versions.tf
└── README.md
# modules/vpc/versions.tf
terraform {
required_version = ">= 1.5.0"
required_providers {
aws = {
source = "hashicorp/aws"
version = ">= 5.0, < 6.0"
}
}
}
# modules/vpc/variables.tf
variable "cidr_block" {
description = "VPC CIDR block"
type = string
default = "10.0.0.0/16"
}
variable "environment" {
description = "Environment name"
type = string
}
variable "public_subnets" {
description = "List of public subnet CIDR blocks"
type = list(string)
default = ["10.0.1.0/24", "10.0.2.0/24"]
}
variable "private_subnets" {
description = "List of private subnet CIDR blocks"
type = list(string)
default = ["10.0.10.0/24", "10.0.11.0/24"]
}
variable "enable_nat_gateway" {
description = "Enable NAT Gateway for private subnets"
type = bool
default = true
}
variable "single_nat_gateway" {
description = "Use single NAT Gateway to reduce cost"
type = bool
default = false
}
variable "tags" {
description = "Additional tags for all resources"
type = map(string)
default = {}
}
# modules/vpc/main.tf
resource "aws_vpc" "this" {
cidr_block = var.cidr_block
enable_dns_hostnames = true
enable_dns_support = true
tags = merge(
{
Name = "${var.environment}-vpc"
Environment = var.environment
ManagedBy = "terraform"
},
var.tags
)
}
resource "aws_internet_gateway" "this" {
vpc_id = aws_vpc.this.id
tags = merge(
{
Name = "${var.environment}-igw"
Environment = var.environment
ManagedBy = "terraform"
},
var.tags
)
}
resource "aws_subnet" "public" {
count = length(var.public_subnets)
vpc_id = aws_vpc.this.id
cidr_block = var.public_subnets[count.index]
availability_zone = data.aws_availability_zones.available.names[count.index]
map_public_ip_on_launch = true
tags = merge(
{
Name = "${var.environment}-public-${count.index + 1}"
Environment = var.environment
Tier = "public"
ManagedBy = "terraform"
},
var.tags
)
}
resource "aws_subnet" "private" {
count = length(var.private_subnets)
vpc_id = aws_vpc.this.id
cidr_block = var.private_subnets[count.index]
availability_zone = data.aws_availability_zones.available.names[count.index]
tags = merge(
{
Name = "${var.environment}-private-${count.index + 1}"
Environment = var.environment
Tier = "private"
ManagedBy = "terraform"
},
var.tags
)
}
resource "aws_eip" "nat" {
count = var.enable_nat_gateway ? (var.single_nat_gateway ? 1 : length(var.private_subnets)) : 0
domain = "vpc"
tags = merge(
{
Name = "${var.environment}-nat-eip-${count.index + 1}"
Environment = var.environment
ManagedBy = "terraform"
},
var.tags
)
}
resource "aws_nat_gateway" "this" {
count = var.enable_nat_gateway ? (var.single_nat_gateway ? 1 : length(var.private_subnets)) : 0
allocation_id = aws_eip.nat[count.index].id
subnet_id = aws_subnet.public[count.index % length(aws_subnet.public)].id
tags = merge(
{
Name = "${var.environment}-nat-${count.index + 1}"
Environment = var.environment
ManagedBy = "terraform"
},
var.tags
)
depends_on = [aws_internet_gateway.this]
}
resource "aws_route_table" "public" {
vpc_id = aws_vpc.this.id
route {
cidr_block = "0.0.0.0/0"
gateway_id = aws_internet_gateway.this.id
}
tags = merge(
{
Name = "${var.environment}-public-rt"
Environment = var.environment
ManagedBy = "terraform"
},
var.tags
)
}
resource "aws_route_table" "private" {
count = length(var.private_subnets)
vpc_id = aws_vpc.this.id
route {
cidr_block = "0.0.0.0/0"
nat_gateway_id = var.single_nat_gateway ? aws_nat_gateway.this[0].id : aws_nat_gateway.this[count.index].id
}
tags = merge(
{
Name = "${var.environment}-private-rt-${count.index + 1}"
Environment = var.environment
ManagedBy = "terraform"
},
var.tags
)
}
resource "aws_route_table_association" "public" {
count = length(var.public_subnets)
subnet_id = aws_subnet.public[count.index].id
route_table_id = aws_route_table.public.id
}
resource "aws_route_table_association" "private" {
count = length(var.private_subnets)
subnet_id = aws_subnet.private[count.index].id
route_table_id = aws_route_table.private[count.index].id
}
data "aws_availability_zones" "available" {
state = "available"
}
# modules/vpc/outputs.tf
output "vpc_id" {
description = "VPC ID"
value = aws_vpc.this.id
}
output "vpc_cidr" {
description = "VPC CIDR block"
value = aws_vpc.this.cidr_block
}
output "public_subnet_ids" {
description = "List of public subnet IDs"
value = aws_subnet.public[*].id
}
output "private_subnet_ids" {
description = "List of private subnet IDs"
value = aws_subnet.private[*].id
}
output "nat_gateway_ids" {
description = "List of NAT Gateway IDs"
value = aws_nat_gateway.this[*].id
}
output "igw_id" {
description = "Internet Gateway ID"
value = aws_internet_gateway.this.id
}
組合模組:完整應用基礎設施
# modules/app-stack/main.tf
module "vpc" {
source = "../vpc"
cidr_block = var.vpc_cidr
environment = var.environment
public_subnets = var.public_subnet_cidrs
private_subnets = var.private_subnet_cidrs
enable_nat_gateway = true
single_nat_gateway = var.environment != "prod"
tags = local.common_tags
}
module "rds" {
source = "../rds"
environment = var.environment
vpc_id = module.vpc.vpc_id
subnet_ids = module.vpc.private_subnet_ids
engine = var.db_engine
engine_version = var.db_engine_version
instance_class = var.db_instance_class
allocated_storage = var.db_allocated_storage
database_name = var.database_name
username = var.db_username
password = var.db_password
tags = local.common_tags
}
module "ecs" {
source = "../ecs"
environment = var.environment
vpc_id = module.vpc.vpc_id
subnet_ids = module.vpc.private_subnet_ids
cluster_name = "${var.environment}-cluster"
container_image = var.container_image
container_port = var.container_port
desired_count = var.desired_count
cpu = var.cpu
memory = var.memory
environment_variables = merge(
{
DATABASE_URL = "postgresql://${var.db_username}:${var.db_password}@${module.rds.endpoint}/${var.database_name}"
ENVIRONMENT = var.environment
},
var.extra_environment_variables
)
tags = local.common_tags
}
locals {
common_tags = merge(
{
Environment = var.environment
Project = var.project_name
ManagedBy = "terraform"
},
var.tags
)
}
模組版本管理
# 使用Terraform Registry模組
module "vpc" {
source = "terraform-aws-modules/vpc/aws"
version = "~> 5.0"
}
# 使用Git倉庫模組(帶標籤)
module "app_stack" {
source = "git::https://github.com/myorg/terraform-modules.git//modules/app-stack?ref=v2.1.0"
}
# 使用本地模組(開發階段)
module "vpc" {
source = "../../modules/vpc"
}
# 使用S3儲存的模組包
module "app_stack" {
source = "s3::https://my-terraform-modules.s3.amazonaws.com/app-stack/v2.1.0.zip"
}
模組測試
# modules/vpc/tests/main.tftest.hcl
run "validate_vpc_cidr" {
command = plan
variables {
cidr_block = "10.0.0.0/16"
environment = "test"
}
assert {
condition = aws_vpc.this.cidr_block == "10.0.0.0/16"
error_message = "VPC CIDR block should match input"
}
}
run "validate_subnets" {
command = plan
variables {
cidr_block = "10.0.0.0/16"
environment = "test"
public_subnets = ["10.0.1.0/24", "10.0.2.0/24"]
private_subnets = ["10.0.10.0/24", "10.0.11.0/24"]
}
assert {
condition = length(aws_subnet.public) == 2
error_message = "Should create 2 public subnets"
}
assert {
condition = length(aws_subnet.private) == 2
error_message = "Should create 2 private subnets"
}
}
run "validate_nat_gateway_production" {
command = plan
variables {
cidr_block = "10.0.0.0/16"
environment = "prod"
enable_nat_gateway = true
single_nat_gateway = false
private_subnets = ["10.0.10.0/24", "10.0.11.0/24", "10.0.12.0/24"]
}
assert {
condition = length(aws_nat_gateway.this) == 3
error_message = "Production should have one NAT Gateway per AZ"
}
}
# 執行模組測試
cd modules/vpc
terraform test
# 執行所有模組測試
terraform test -recursive
Pattern 2: 狀態管理
Terraform狀態檔案是IaC最關鍵的資料。遺失狀態等於遺失對基礎設施的控制。生產環境必須使用遠端後端、啟用狀態鎖、定期偵測Drift。
遠端後端架構
┌──────────────────────────────────────────────────────┐
│ 狀態管理三層防護 │
│ │
│ ┌──────────────────────────────────────────────┐ │
│ │ Layer 1: 遠端後端 │ │
│ │ S3 + DynamoDB / GCS / Azure Blob │ │
│ │ (狀態持久化,團隊共享) │ │
│ └──────────────────────────────────────────────┘ │
│ │
│ ┌──────────────────────────────────────────────┐ │
│ │ Layer 2: 狀態鎖 │ │
│ │ DynamoDB / GCS原生鎖 / Azure Blob租約 │ │
│ │ (防止並行修改,序列化apply) │ │
│ └──────────────────────────────────────────────┘ │
│ │
│ ┌──────────────────────────────────────────────┐ │
│ │ Layer 3: Drift偵測 │ │
│ │ terraform refresh + CI/CD定時檢查 │ │
│ │ (發現手動變更,保持狀態一致) │ │
│ └──────────────────────────────────────────────┘ │
└──────────────────────────────────────────────────────┘
S3 + DynamoDB後端設定
# backend.tf
terraform {
backend "s3" {
bucket = "myorg-terraform-state"
key = "app-infra/terraform.tfstate"
region = "us-east-1"
encrypt = true
dynamodb_table = "terraform-locks"
kms_key_id = "arn:aws:kms:us-east-1:123456789012:key/abc123"
state_lock_timeout = "30m"
}
}
引導後端基礎設施
# bootstrap/main.tf
resource "aws_s3_bucket" "terraform_state" {
bucket = "myorg-terraform-state"
lifecycle {
prevent_destroy = true
}
}
resource "aws_s3_bucket_versioning" "terraform_state" {
bucket = aws_s3_bucket.terraform_state.id
versioning_configuration {
status = "Enabled"
}
}
resource "aws_s3_bucket_server_side_encryption_configuration" "terraform_state" {
bucket = aws_s3_bucket.terraform_state.id
rule {
apply_server_side_encryption_by_default {
sse_algorithm = "aws:kms"
kms_master_key_id = aws_kms_key.terraform_state.arn
}
}
}
resource "aws_s3_bucket_public_access_block" "terraform_state" {
bucket = aws_s3_bucket.terraform_state.id
block_public_acls = true
block_public_policy = true
ignore_public_acls = true
restrict_public_buckets = true
}
resource "aws_s3_bucket_lifecycle_configuration" "terraform_state" {
bucket = aws_s3_bucket.terraform_state.id
rule {
id = "cleanup-old-versions"
status = "Enabled"
noncurrent_version_transition {
noncurrent_days = 90
storage_class = "GLACIER"
}
noncurrent_version_expiration {
noncurrent_days = 365
}
}
}
resource "aws_dynamodb_table" "terraform_locks" {
name = "terraform-locks"
billing_mode = "PAY_PER_REQUEST"
hash_key = "LockID"
attribute {
name = "LockID"
type = "S"
}
}
resource "aws_kms_key" "terraform_state" {
description = "Terraform state encryption key"
deletion_window_in_days = 30
enable_key_rotation = true
}
resource "aws_kms_alias" "terraform_state" {
name = "alias/terraform-state"
target_key_id = aws_kms_key.terraform_state.key_id
}
# 引導流程
cd bootstrap
terraform init
terraform apply
# 遷移到遠端後端
# 建立backend.tf後執行
terraform init -migrate-state
# 驗證狀態已遷移到S3
aws s3 ls s3://myorg-terraform-state/app-infra/
狀態分層
┌──────────────────────────────────────────────────────┐
│ 狀態分層架構 │
│ │
│ ┌──────────────────────────────────────────────┐ │
│ │ Layer 0: bootstrap (本地狀態) │ │
│ │ S3 Bucket / DynamoDB / KMS / IAM │ │
│ └──────────────────────────────────────────────┘ │
│ │ 資料流 │
│ ┌──────────────────────────────────────────────┐ │
│ │ Layer 1: networking (遠端狀態) │ │
│ │ VPC / Subnets / Route Tables / NAT GW │ │
│ └──────────────────────────────────────────────┘ │
│ │ 資料流 │
│ ┌──────────────────────────────────────────────┐ │
│ │ Layer 2: compute (遠端狀態) │ │
│ │ ECS / EKS / RDS / ElastiCache │ │
│ └──────────────────────────────────────────────┘ │
│ │ 資料流 │
│ ┌──────────────────────────────────────────────┐ │
│ │ Layer 3: services (遠端狀態) │ │
│ │ DNS / CDN / Monitoring / Alerts │ │
│ └──────────────────────────────────────────────┘ │
└──────────────────────────────────────────────────────┘
# Layer 2引用Layer 1的輸出
data "terraform_remote_state" "networking" {
backend = "s3"
config = {
bucket = "myorg-terraform-state"
key = "networking/terraform.tfstate"
region = "us-east-1"
}
}
module "ecs" {
source = "../../modules/ecs"
vpc_id = data.terraform_remote_state.networking.outputs.vpc_id
subnet_ids = data.terraform_remote_state.networking.outputs.private_subnet_ids
}
Drift偵測
# 手動Drift偵測
terraform plan -detailed-exitcode
# 0 = 無變更
# 1 = 錯誤
# 2 = 有變更(Drift存在)
# 重新整理狀態
terraform refresh
# .github/workflows/drift-detection.yml
name: Drift Detection
on:
schedule:
- cron: "0 8 * * 1-5"
workflow_dispatch:
jobs:
drift-detection:
runs-on: ubuntu-latest
steps:
- uses: actions/checkout@v4
- name: Setup Terraform
uses: hashicorp/setup-terraform@v3
with:
terraform_version: "1.10.0"
- name: Terraform Init
run: terraform init -backend-config=backend.hcl
- name: Check Drift
run: |
terraform plan -detailed-exitcode -out=plan.out || exit_code=$?
if [ "${exit_code:-0}" -eq 2 ]; then
echo "::warning::Infrastructure drift detected!"
terraform show -json plan.out | jq -r '.resource_changes[] | select(.change.actions != ["no-op"]) | "\(.type).\(.name): \(.change.actions | join(", "))"'
exit 1
fi
Pattern 3: Workspace環境隔離
多環境管理是IaC最基本的需求。dev/staging/prod三套環境,共享模組但設定不同。Terraform Workspace提供了輕量級的環境隔離方案,但需要配合變數管理才能用好。
Workspace vs 目錄隔離
┌─────────────────────────────────────────────────────────────┐
│ 環境隔離兩種方案對比 │
├──────────────────────┬──────────────────────────────────────┤
│ Workspace隔離 │ 目錄隔離 │
├──────────────────────┼──────────────────────────────────────┤
│ 單個State檔案/環境 │ 每個環境獨立State檔案 │
│ 同一份程式碼 │ 每個環境獨立程式碼目錄 │
│ workspace切換 │ 目錄切換 │
│ 適合簡單環境 │ 適合複雜環境 │
│ 狀態檔案路徑: │ 狀態檔案路徑: │
│ env:/dev/state │ dev/terraform.tfstate │
│ env:/prod/state │ prod/terraform.tfstate │
└──────────────────────┴──────────────────────────────────────┘
推薦方案:目錄隔離 + 共享模組
infra/
├── modules/
│ ├── vpc/
│ ├── rds/
│ └── ecs/
├── environments/
│ ├── dev/
│ │ ├── main.tf
│ │ ├── variables.tf
│ │ ├── outputs.tf
│ │ ├── backend.tf
│ │ └── terraform.tfvars
│ ├── staging/
│ │ ├── main.tf
│ │ ├── variables.tf
│ │ ├── outputs.tf
│ │ ├── backend.tf
│ │ └── terraform.tfvars
│ └── prod/
│ ├── main.tf
│ ├── variables.tf
│ ├── outputs.tf
│ ├── backend.tf
│ └── terraform.tfvars
└── shared/
└── locals.tf
環境設定
# environments/dev/backend.tf
terraform {
backend "s3" {
bucket = "myorg-terraform-state"
key = "dev/app-infra/terraform.tfstate"
region = "us-east-1"
encrypt = true
dynamodb_table = "terraform-locks"
}
}
# environments/dev/main.tf
module "vpc" {
source = "../../modules/vpc"
cidr_block = "10.0.0.0/16"
environment = "dev"
public_subnets = ["10.0.1.0/24"]
private_subnets = ["10.0.10.0/24"]
enable_nat_gateway = true
single_nat_gateway = true
}
module "rds" {
source = "../../modules/rds"
environment = "dev"
vpc_id = module.vpc.vpc_id
subnet_ids = module.vpc.private_subnet_ids
engine = "postgres"
engine_version = "16.4"
instance_class = "db.t3.micro"
allocated_storage = 20
database_name = "appdb_dev"
username = "appadmin"
password = var.db_password
}
module "ecs" {
source = "../../modules/ecs"
environment = "dev"
vpc_id = module.vpc.vpc_id
subnet_ids = module.vpc.private_subnet_ids
container_image = "123456789012.dkr.ecr.us-east-1.amazonaws.com/app:dev-latest"
desired_count = 1
cpu = 256
memory = 512
}
# environments/prod/main.tf
module "vpc" {
source = "../../modules/vpc"
cidr_block = "10.100.0.0/16"
environment = "prod"
public_subnets = ["10.100.1.0/24", "10.100.2.0/24", "10.100.3.0/24"]
private_subnets = ["10.100.10.0/24", "10.100.11.0/24", "10.100.12.0/24"]
enable_nat_gateway = true
single_nat_gateway = false
}
module "rds" {
source = "../../modules/rds"
environment = "prod"
vpc_id = module.vpc.vpc_id
subnet_ids = module.vpc.private_subnet_ids
engine = "postgres"
engine_version = "16.4"
instance_class = "db.r6g.xlarge"
allocated_storage = 500
database_name = "appdb_prod"
username = "appadmin"
password = var.db_password
multi_az = true
}
module "ecs" {
source = "../../modules/ecs"
environment = "prod"
vpc_id = module.vpc.vpc_id
subnet_ids = module.vpc.private_subnet_ids
container_image = "123456789012.dkr.ecr.us-east-1.amazonaws.com/app:v2.1.0"
desired_count = 3
cpu = 1024
memory = 2048
}
變數驗證
# variables.tf
variable "environment" {
description = "Environment name"
type = string
validation {
condition = contains(["dev", "staging", "prod"], var.environment)
error_message = "Environment must be one of: dev, staging, prod."
}
}
variable "db_password" {
description = "Database password"
type = string
sensitive = true
validation {
condition = length(var.db_password) >= 16
error_message = "Database password must be at least 16 characters."
}
}
Pattern 4: OpenTofu遷移
2023年8月,HashiCorp將Terraform授權從Mozilla Public License 2.0更改為Business Source License 1.1。BSL 1.1禁止「競爭性使用」——包括使用Terraform提供競爭性IaC服務。這對大多數企業不構成問題,但開源社群的反應催生了OpenTofu。
遷移評估
| 評估維度 | Terraform (BSL 1.1) | OpenTofu (MPL 2.0) |
|---|---|---|
| 授權 | BSL 1.1(競爭性使用限制) | MPL 2.0(完全開源) |
| CLI相容性 | 原生 | 100%相容Terraform 1.6 |
| Provider相容 | 原生 | 相容所有社群Provider |
| 狀態檔案 | 原生格式 | 100%相容 |
| 企業支援 | HashiCorp支援 | Linux基金會社群 |
| 新特性 | 1.10+原生測試 | 1.9+加密狀態 |
| 註冊表 | Terraform Registry | OpenTofu Registry |
遷移步驟
# Step 1: 安裝OpenTofu
# macOS
brew install opentofu
# Linux
curl -fsSL https://get.opentofu.org/install-opentofu.sh | bash
# 驗證版本相容性
tofu version
# OpenTofu v1.9.0
# Step 2: 替換CLI命令
# terraform init → tofu init
# terraform plan → tofu plan
# terraform apply → tofu apply
# terraform destroy → tofu destroy
# Step 3: 驗證相容性
tofu init
tofu plan
# 如果plan輸出與terraform plan一致,遷移成功
# Step 4: 更新CI/CD設定
# 將所有terraform命令替換為tofu
OpenTofu獨有特性:加密狀態
# OpenTofu 1.9+支援原生狀態加密
terraform {
encryption {
key_provider "pbkdf2" "mykey" {
passphrase = var.encryption_passphrase
key_length = 32
iterations = 600000
salt = "fixed-salt-for-key-derivation"
}
method "aes_gcm" "myencryption" {
keys = key_provider.pbkdf2.mykey
}
state {
method = method.aes_gcm.myencryption
fallback {
method = method.aes_gcm.myencryption
}
}
plan {
method = method.aes_gcm.myencryption
fallback {
method = method.aes_gcm.myencryption
}
}
}
}
漸進式遷移策略
┌──────────────────────────────────────────────────────┐
│ 漸進式遷移路線圖 │
│ │
│ Phase 1: 評估(1-2週) │
│ ├── 盤點所有Terraform專案 │
│ ├── 檢查Provider和Module相容性 │
│ └── 制定遷移優先順序 │
│ │
│ Phase 2: 非生產環境遷移(2-4週) │
│ ├── dev/staging環境切換到OpenTofu │
│ ├── 驗證plan輸出一致性 │
│ └── 更新CI/CD Pipeline │
│ │
│ Phase 3: 生產環境遷移(1-2週) │
│ ├── 生產環境切換到OpenTofu │
│ ├── 啟用狀態加密 │
│ └── 監控執行1週確認穩定 │
│ │
│ Phase 4: 清理(1週) │
│ ├── 移除Terraform CLI依賴 │
│ ├── 更新文件和Runbook │
│ └── 團隊培訓完成 │
└──────────────────────────────────────────────────────┘
Pattern 5: GitOps整合
Terraform IaC的最終目標是GitOps:程式碼提交即觸發plan,審批後自動apply。Atlantis是目前最成熟的Terraform GitOps工具,在Pull Request中直接執行terraform plan和terraform apply。
Atlantis架構
┌──────────────────────────────────────────────────────┐
│ Atlantis GitOps架構 │
│ │
│ ┌──────────┐ webhook ┌──────────────────┐ │
│ │ GitHub │───────────────▶│ Atlantis │ │
│ │ /GitLab │ │ Server │ │
│ │ │◀───────────────│ │ │
│ │ │ PR Comment │ ┌──────────────┐│ │
│ │ │ (plan/apply) │ │ terraform ││ │
│ └──────────┘ │ │ plan/apply ││ │
│ │ └──────────────┘│ │
│ └────────┬─────────┘ │
│ │ │
│ ┌────────▼─────────┐ │
│ │ AWS / GCP / │ │
│ │ Azure API │ │
│ └──────────────────┘ │
└──────────────────────────────────────────────────────┘
部署Atlantis
# atlantis-deployment.yaml
apiVersion: apps/v1
kind: Deployment
metadata:
name: atlantis
namespace: atlantis
spec:
replicas: 1
selector:
matchLabels:
app: atlantis
template:
metadata:
labels:
app: atlantis
spec:
containers:
- name: atlantis
image: ghcr.io/runatlantis/atlantis:v0.30.0
ports:
- containerPort: 4141
env:
- name: ATLANTIS_GH_USER
value: "myorg-bot"
- name: ATLANTIS_GH_TOKEN
valueFrom:
secretKeyRef:
name: atlantis-secrets
key: github-token
- name: ATLANTIS_GH_WEBHOOK_SECRET
valueFrom:
secretKeyRef:
name: atlantis-secrets
key: webhook-secret
- name: ATLANTIS_ALLOW_REPO_CONFIG
value: "true"
- name: ATLANTIS_PARALLEL_PLAN_COUNT
value: "4"
- name: ATLANTIS_PARALLEL_APPLY_COUNT
value: "2"
- name: ATLANTIS_AUTOPLAN_ENABLED
value: "true"
- name: ATLANTIS_REPO_CONFIG_JSON
value: |
{
"repos": [
{
"id": "/.*/",
"apply_requirements": ["approved", "mergeable"],
"plan_requirements": [],
"import_requirements": [],
"allowed_overrides": ["apply_requirements", "workflow"],
"allow_custom_workflows": true
}
]
}
resources:
requests:
cpu: "500m"
memory: "1Gi"
limits:
cpu: "2"
memory: "4Gi"
volumeMounts:
- name: atlantis-data
mountPath: /home/atlantis
- name: repo-config
mountPath: /etc/atlantis
volumes:
- name: atlantis-data
persistentVolumeClaim:
claimName: atlantis-data
- name: repo-config
configMap:
name: atlantis-config
Atlantis倉庫設定
# atlantis.yaml(專案根目錄)
version: 3
projects:
- name: dev-infra
dir: environments/dev
workflow: terraform
autoplan:
when_modified: ["../../modules/**/*.tf", "*.tf", "*.tfvars"]
enabled: true
apply_requirements:
- approved
- mergeable
- name: staging-infra
dir: environments/staging
workflow: terraform
autoplan:
when_modified: ["../../modules/**/*.tf", "*.tf", "*.tfvars"]
enabled: true
apply_requirements:
- approved
- mergeable
- name: prod-infra
dir: environments/prod
workflow: terraform
autoplan:
when_modified: ["../../modules/**/*.tf", "*.tf", "*.tfvars"]
enabled: true
apply_requirements:
- approved
- mergeable
- undiverged
workflows:
terraform:
plan:
steps:
- env:
name: TF_VAR_db_password
value: ${DB_PASSWORD}
- run: terraform init -backend-config=backend.hcl -reconfigure
- run: terraform plan -out=$PLANFILE -var-file=terraform.tfvars
- run: terraform show -json $PLANFILE > $SHOWFILE
apply:
steps:
- run: terraform apply $PLANFILE
Policy as Code:Sentinel / OPA
# sentinel/require-tags.sentinel
import "tfplan/v2" as tfplan
allResources = filter tfplan.resource_changes as _, rc {
rc.mode == "managed" and
rc.type != "null_resource" and
rc.change.actions != ["delete"]
}
tagsRequired = rule {
all allResources as _, rc {
rc.change.after.tags contains "Environment" and
rc.change.after.tags contains "ManagedBy"
}
}
main = rule {
tagsRequired
}
# opa/require-tags.rego
package terraform
import future.keywords.if
import future.keywords.in
deny[msg] if {
some rc in input.resource_changes
rc.mode == "managed"
rc.change.actions[_] != "delete"
not "Environment" in object.keys(rc.change.after.tags)
msg := sprintf("Resource %s of type %s missing Environment tag", [rc.name, rc.type])
}
deny[msg] if {
some rc in input.resource_changes
rc.mode == "managed"
rc.change.actions[_] != "delete"
not "ManagedBy" in object.keys(rc.change.after.tags)
msg := sprintf("Resource %s of type %s missing ManagedBy tag", [rc.name, rc.type])
}
# 使用OPA檢查Terraform Plan
terraform plan -out=plan.out
terraform show -json plan.out > plan.json
# 執行OPA策略檢查
opa eval --data opa/ --input plan.json "data.terraform.deny"
5個常見坑及解決方案
坑1: 狀態檔案損壞
現象:terraform plan報錯state snapshot was created by a newer version或invalid character。
原因:狀態檔案被手動編輯、磁碟故障、S3版本回退導致。
解決方案:
# 1. 從S3版本歷史恢復
aws s3api list-object-versions \
--bucket myorg-terraform-state \
--prefix dev/app-infra/terraform.tfstate
# 恢復到上一個版本
aws s3api copy-object \
--bucket myorg-terraform-state \
--copy-source myorg-terraform-state/dev/app-infra/terraform.tfstate?versionId=PREVIOUS_VERSION \
--key dev/app-infra/terraform.tfstate
# 2. 強制拉取並修復
terraform state pull > state.json
# 手動修復JSON(謹慎操作)
terraform state push state.json
坑2: Provider版本鎖定導致CI失敗
現象:本地terraform plan正常,CI/CD中報Provider下載失敗或版本不相容。
原因:本地有快取,CI環境每次全新安裝。Provider版本未鎖定。
解決方案:
# versions.tf - 鎖定Provider版本
terraform {
required_version = ">= 1.5.0, < 2.0.0"
required_providers {
aws = {
source = "hashicorp/aws"
version = "5.80.0"
}
}
}
# 提交鎖定檔案到Git
git add .terraform.lock.hcl
git commit -m "chore: lock provider versions"
坑3: 迴圈依賴
現象:terraform plan報錯Cycle: module.x, module.y。
原因:模組A的輸出依賴模組B的輸出,模組B又依賴模組A的輸出。
解決方案:
# 錯誤:迴圈依賴
module "vpc" {
source = "./vpc"
ecs_security_group_id = module.ecs.security_group_id
}
module "ecs" {
source = "./ecs"
subnet_ids = module.vpc.private_subnet_ids
}
# 正確:拆分為3層,單向依賴
# Layer 1: 網路
module "vpc" {
source = "./vpc"
}
# Layer 2: 安全群組
module "security_groups" {
source = "./security-groups"
vpc_id = module.vpc.vpc_id
}
# Layer 3: 運算
module "ecs" {
source = "./ecs"
subnet_ids = module.vpc.private_subnet_ids
security_group_ids = module.security_groups.app_ids
}
坑4: 敏感變數洩露到狀態檔案
現象:terraform show能看到明文密碼,狀態檔案中包含敏感資訊。
原因:sensitive = true只隱藏CLI輸出,不加密狀態檔案中的值。
解決方案:
# 方案1: 使用AWS SSM Parameter Store
data "aws_ssm_parameter" "db_password" {
name = "/app/${var.environment}/db-password"
with_decryption = true
}
module "rds" {
source = "../../modules/rds"
password = data.aws_ssm_parameter.db_password.value
}
# 方案2: 使用AWS Secrets Manager
data "aws_secretsmanager_secret_version" "db_creds" {
secret_id = "app/${var.environment}/db-credentials"
}
module "rds" {
source = "../../modules/rds"
password = jsondecode(data.aws_secretsmanager_secret_version.db_creds.secret_string)["password"]
}
坑5: 模組重構導致資源重建
現象:修改模組名稱或移動資源後,terraform plan顯示要刪除重建資源。
原因:Terraform透過資源位址(module.vpc.aws_vpc.this)識別資源。位址變了,Terraform認為是新資源。
解決方案:
# 重構前:先移動狀態
terraform state mv module.vpc module.networking.vpc
terraform state mv module.vpc.aws_vpc.this module.networking.aws_vpc.this
# 然後修改程式碼
# 移動模組檔案
mv modules/vpc modules/networking/vpc
# 更新模組引用
# module "vpc" → module "networking_vpc"
# 驗證
terraform plan # 應該顯示no changes
10個常見報錯排查
1. Error: Failed to load plugin
# 清除外掛快取重新下載
rm -rf .terraform/providers
terraform init -upgrade
# 檢查網路代理
export HTTPS_PROXY=http://proxy.internal:8080
terraform init
2. Error: Error locking state: Error acquiring the state lock
# 檢視DynamoDB中的鎖
aws dynamodb scan --table-name terraform-locks
# 確認沒有其他程序在執行
# 如果確認鎖是殘留的,強制解鎖
terraform force-unlock <lock-id>
3. Error: Provider produced inconsistent result after apply
# 這是Provider的Bug,通常可以透過以下方式解決
# 1. 更新Provider版本
terraform init -upgrade
# 2. 如果是已知的Provider Bug,使用lifecycle忽略變化
resource "aws_instance" "app" {
lifecycle {
ignore_changes = [user_data_replace_on_change]
}
}
4. Error: Resource already managed by Terraform
# 資源已存在於狀態中,但程式碼中已刪除
# 檢視狀態中的資源
terraform state list
# 從狀態中移除
terraform state rm aws_instance.old_resource
5. Error: Module not found
# 清除模組快取
rm -rf .terraform/modules
terraform init -upgrade
# 檢查模組來源路徑
# 本地模組路徑是相對於目前tf檔案的路徑
module "vpc" {
source = "../../modules/vpc"
}
6. Error: Invalid for_each argument
# 錯誤:for_each的值在plan時不可知
resource "aws_subnet" "private" {
for_each = toset(module.vpc.private_subnet_cidrs)
}
# 正確:使用已知值
variable "private_subnets" {
type = list(string)
}
resource "aws_subnet" "private" {
for_each = toset(var.private_subnets)
}
7. Error: Value for unconfigurable attribute
# 錯誤:試圖設定唯讀屬性
resource "aws_eip" "nat" {
instance = aws_instance.nat.id
domain = "vpc"
}
# 正確:檢查Provider文件,只設定可寫屬性
resource "aws_eip" "nat" {
domain = "vpc"
}
8. Error: Backend configuration changed
# 後端設定變更後需要重新初始化
terraform init -migrate-state
# 如果遷移失敗,手動遷移
terraform state pull > state.json
# 修改backend.tf
terraform init
terraform state push state.json
9. Error: Invalid terraform configuration: No required_providers
# 每個模組都需要宣告required_providers
terraform {
required_providers {
aws = {
source = "hashicorp/aws"
version = ">= 5.0"
}
}
}
10. Error: Incompatible API version
# Provider版本與Terraform版本不相容
# 檢查相容性
terraform version
terraform providers
# 更新到相容版本
terraform {
required_version = ">= 1.5.0"
required_providers {
aws = {
source = "hashicorp/aws"
version = "5.80.0"
}
}
}
進階最佳化技巧
1. Terraform Cloud / Enterprise
# terraform-cloud.tf
terraform {
cloud {
organization = "myorg"
workspaces {
tags = ["app-infra"]
}
}
}
2. Terraform測試框架
# tests/integration/main.tftest.hcl
run "create_infrastructure" {
command = apply
module {
source = "../../environments/dev"
}
variables {
db_password = "test-password-12345678"
}
}
run "validate_endpoints" {
command = apply
variables {
api_endpoint = run.create_infrastructure.api_url
}
assert {
condition = can(http_request.check.status_code == 200)
error_message = "API endpoint should return 200"
}
}
3. Infracost成本估算
# 使用Infracost估算成本
infracost breakdown --path=plan.json \
--format=json \
--out-file=infracost.json
# 在PR中新增成本評論
infracost comment github --path=infracost.json \
--behavior=update
4. 模組文件自動產生
# 安裝terraform-docs
brew install terraform-docs
# 產生README
terraform-docs markdown table ./modules/vpc > ./modules/vpc/README.md
# .pre-commit-config.yaml
repos:
- repo: https://github.com/antonbabenko/pre-commit-terraform
rev: v1.92.0
hooks:
- id: terraform_fmt
- id: terraform_validate
- id: terraform_docs
args:
- '--args=--lockfile=false'
- id: terraform_tflint
- id: terraform_trivy
- id: terraform_checkov
對比分析:Terraform vs OpenTofu vs Pulumi vs CDK
| 維度 | Terraform | OpenTofu | Pulumi | CDK |
|---|---|---|---|---|
| 語言 | HCL | HCL | TS/Python/Go/C# | TS/Python/Java/C# |
| 授權 | BSL 1.1 | MPL 2.0 | Apache 2.0 | Apache 2.0 |
| Provider | 3000+ | 3000+ | 200+ | AWS為主 |
| 狀態加密 | S3 KMS | 原生加密 | Pulumi Cloud | CDK Cloud |
| 測試 | terraform test | terraform test | Mocha/Jest | Jest |
| GitOps | Atlantis | Atlantis | Pulumi Cloud | CDK Pipelines |
| 學習曲線 | 低 | 低 | 低 | 中 |
| 多雲 | 原生 | 原生 | 原生 | AWS為主 |
| 社群 | 最大 | 成長中 | 成長中 | AWS生態 |
| 企業支援 | HashiCorp | Linux基金會 | Pulumi Corp | AWS |
選型決策樹
團隊是否熟悉HCL?
├── 是 → 是否關注授權合規?
│ ├── 是 → OpenTofu
│ └── 否 → Terraform
└── 否 → 是否主要使用AWS?
├── 是 → CDK
└── 否 → 是否偏好通用程式語言?
├── 是 → Pulumi
└── 否 → Terraform/OpenTofu(HCL簡單易學)
線上工具推薦
- JSON格式化:/zh-TW/json/format — 格式化Terraform狀態檔案和Plan輸出
- Base64編解碼:/zh-TW/encode/base64 — 處理Terraform中的Base64編碼資料
- 雜湊計算:/zh-TW/encode/hash — 計算設定檔案雜湊值,驗證狀態檔案完整性
總結:Terraform IaC最佳實踐的核心是5個生產模式——模組組合設計讓程式碼可複用可測試,遠端狀態管理確保資料安全可靠,Workspace環境隔離實作多環境管理,OpenTofu遷移解決授權合規,GitOps整合實作自動化plan/apply。2026年,無論選擇Terraform還是OpenTofu,HCL依然是IaC領域最成熟的選擇。關鍵實踐:三層模組架構、S3+DynamoDB遠端後端、目錄隔離環境、Atlantis GitOps、Policy as Code。IaC不是一次性工程,而是持續演進的平台。
相關文章:
- GitOps與ArgoCD生產實踐 — ArgoCD完整部署自動化指南
- Pulumi + TypeScript IaC實戰 — 通用語言IaC替代方案
- Docker容器安全加固 — 容器8層防禦體系
外部參考:
本站提供瀏覽器本地工具,免註冊即可試用 →