Terraform IaC Best Practices: 5 Production Patterns from Module Design to GitOps
In 2026, Terraform IaC Is No Longer About "Whether" But "How Well"
After HashiCorp changed Terraform's license to BSL 1.1 in 2023, the community forked OpenTofu. But whether you choose Terraform or OpenTofu, HCL remains the most widely used IaC language. The question is no longer "should we use IaC" but "how do we make IaC production-ready."
Too many teams have Terraform code that looks like this: one giant main.tf, state files stored locally, all environments sharing the same variables, modules without version management, and manual terraform apply in CI/CD. That's not IaC — that's "manual ops written in code."
This article covers 5 production-grade IaC patterns, from module composition design to GitOps automation, helping you upgrade Terraform from "works" to "works well."
Key Takeaways
- Master module composition design: reusable, testable, versionable module architecture
- Understand 3-layer state management protection: remote backend, state locking, drift detection
- Implement workspace environment isolation and variable management best practices
- Complete seamless migration from Terraform to OpenTofu
- Integrate GitOps workflows: Atlantis + CI/CD automated plan/apply
Table of Contents
- Terraform IaC Core Concepts
- Pattern 1: Module Composition Design
- Pattern 2: State Management
- Pattern 3: Workspace Environment Isolation
- Pattern 4: OpenTofu Migration
- Pattern 5: GitOps Integration
- 5 Common Pitfalls and Solutions
- 10 Common Error Troubleshooting
- Advanced Optimization Tips
- Comparison Analysis
- Recommended Online Tools
Terraform IaC Core Concepts
IaC Maturity Model
┌─────────────────────────────────────────────────────────────┐
│ IaC Maturity Model │
├──────────┬──────────────────┬────────────────────────────────┤
│ Level 1 │ Level 2 │ Level 3 │
│ Scripted │ Modular │ Platform │
├──────────┼──────────────────┼────────────────────────────────┤
│ Single │ Module split │ Module composition + Registry │
│ file │ │ │
│ Local │ Remote state │ State layering + isolation │
│ state │ │ │
│ Manual │ CI/CD triggered │ GitOps automation │
│ execution│ │ │
│ No tests │ Basic tests │ Policy as Code │
│ No │ Git versioning │ Semantic versioning + │
│ versioning│ │ changelogs │
│ Environ. │ Workspace │ Multi-environment │
│ coupled │ isolation │ abstraction layer │
└──────────┴──────────────────┴────────────────────────────────┘
Terraform Core Workflow
┌──────────┐ ┌──────────┐ ┌──────────┐ ┌──────────┐
│ Write │────▶│ Plan │────▶│ Apply │────▶│ State │
│ (HCL) │ │ (Preview)│ │ (Execute)│ │ (Update) │
└──────────┘ └──────────┘ └──────────┘ └──────────┘
│ │ │ │
▼ ▼ ▼ ▼
Git Commit terraform plan terraform apply Remote Backend
Pull Request Plan File Output Resource Create S3/GCS/Cloud
Key Changes in the 2026 Terraform Ecosystem
| Change | Impact | Strategy |
|---|---|---|
| BSL 1.1 License | Enterprise use restricted | Evaluate OpenTofu migration |
| OpenTofu 1.9+ | Community-driven alternative | Prefer for new projects |
| Terraform 1.10+ | Native test framework | Adopt terraform test |
| Crossplane rising | K8s-native IaC | Complementary, not replacement |
| Pulumi matured | General-language IaC | Choose based on team skills |
Pattern 1: Module Composition Design
Modules are the cornerstone of Terraform IaC. But most teams only achieve "splitting files," not "composable." Production-grade module design requires a 3-layer architecture: Base Module, Composition Module, and Environment Module.
Three-Layer Module Architecture
┌──────────────────────────────────────────────────────┐
│ Environment Module │
│ ┌──────────────────────────────────────────────────┐ │
│ │ Composition Module │ │
│ │ ┌──────────┐ ┌──────────┐ ┌──────────┐ │ │
│ │ │ Base │ │ Base │ │ Base │ │ │
│ │ │ Module │ │ Module │ │ Module │ │ │
│ │ │ (VPC) │ │ (RDS) │ │ (ECS) │ │ │
│ │ └──────────┘ └──────────┘ └──────────┘ │ │
│ └──────────────────────────────────────────────────┘ │
└──────────────────────────────────────────────────────┘
Base Module: VPC
modules/
└── vpc/
├── main.tf
├── variables.tf
├── outputs.tf
├── versions.tf
└── README.md
# modules/vpc/versions.tf
terraform {
required_version = ">= 1.5.0"
required_providers {
aws = {
source = "hashicorp/aws"
version = ">= 5.0, < 6.0"
}
}
}
# modules/vpc/variables.tf
variable "cidr_block" {
description = "VPC CIDR block"
type = string
default = "10.0.0.0/16"
}
variable "environment" {
description = "Environment name"
type = string
}
variable "public_subnets" {
description = "List of public subnet CIDR blocks"
type = list(string)
default = ["10.0.1.0/24", "10.0.2.0/24"]
}
variable "private_subnets" {
description = "List of private subnet CIDR blocks"
type = list(string)
default = ["10.0.10.0/24", "10.0.11.0/24"]
}
variable "enable_nat_gateway" {
description = "Enable NAT Gateway for private subnets"
type = bool
default = true
}
variable "single_nat_gateway" {
description = "Use single NAT Gateway to reduce cost"
type = bool
default = false
}
variable "tags" {
description = "Additional tags for all resources"
type = map(string)
default = {}
}
# modules/vpc/main.tf
resource "aws_vpc" "this" {
cidr_block = var.cidr_block
enable_dns_hostnames = true
enable_dns_support = true
tags = merge(
{
Name = "${var.environment}-vpc"
Environment = var.environment
ManagedBy = "terraform"
},
var.tags
)
}
resource "aws_internet_gateway" "this" {
vpc_id = aws_vpc.this.id
tags = merge(
{
Name = "${var.environment}-igw"
Environment = var.environment
ManagedBy = "terraform"
},
var.tags
)
}
resource "aws_subnet" "public" {
count = length(var.public_subnets)
vpc_id = aws_vpc.this.id
cidr_block = var.public_subnets[count.index]
availability_zone = data.aws_availability_zones.available.names[count.index]
map_public_ip_on_launch = true
tags = merge(
{
Name = "${var.environment}-public-${count.index + 1}"
Environment = var.environment
Tier = "public"
ManagedBy = "terraform"
},
var.tags
)
}
resource "aws_subnet" "private" {
count = length(var.private_subnets)
vpc_id = aws_vpc.this.id
cidr_block = var.private_subnets[count.index]
availability_zone = data.aws_availability_zones.available.names[count.index]
tags = merge(
{
Name = "${var.environment}-private-${count.index + 1}"
Environment = var.environment
Tier = "private"
ManagedBy = "terraform"
},
var.tags
)
}
resource "aws_eip" "nat" {
count = var.enable_nat_gateway ? (var.single_nat_gateway ? 1 : length(var.private_subnets)) : 0
domain = "vpc"
tags = merge(
{
Name = "${var.environment}-nat-eip-${count.index + 1}"
Environment = var.environment
ManagedBy = "terraform"
},
var.tags
)
}
resource "aws_nat_gateway" "this" {
count = var.enable_nat_gateway ? (var.single_nat_gateway ? 1 : length(var.private_subnets)) : 0
allocation_id = aws_eip.nat[count.index].id
subnet_id = aws_subnet.public[count.index % length(aws_subnet.public)].id
tags = merge(
{
Name = "${var.environment}-nat-${count.index + 1}"
Environment = var.environment
ManagedBy = "terraform"
},
var.tags
)
depends_on = [aws_internet_gateway.this]
}
resource "aws_route_table" "public" {
vpc_id = aws_vpc.this.id
route {
cidr_block = "0.0.0.0/0"
gateway_id = aws_internet_gateway.this.id
}
tags = merge(
{
Name = "${var.environment}-public-rt"
Environment = var.environment
ManagedBy = "terraform"
},
var.tags
)
}
resource "aws_route_table" "private" {
count = length(var.private_subnets)
vpc_id = aws_vpc.this.id
route {
cidr_block = "0.0.0.0/0"
nat_gateway_id = var.single_nat_gateway ? aws_nat_gateway.this[0].id : aws_nat_gateway.this[count.index].id
}
tags = merge(
{
Name = "${var.environment}-private-rt-${count.index + 1}"
Environment = var.environment
ManagedBy = "terraform"
},
var.tags
)
}
resource "aws_route_table_association" "public" {
count = length(var.public_subnets)
subnet_id = aws_subnet.public[count.index].id
route_table_id = aws_route_table.public.id
}
resource "aws_route_table_association" "private" {
count = length(var.private_subnets)
subnet_id = aws_subnet.private[count.index].id
route_table_id = aws_route_table.private[count.index].id
}
data "aws_availability_zones" "available" {
state = "available"
}
# modules/vpc/outputs.tf
output "vpc_id" {
description = "VPC ID"
value = aws_vpc.this.id
}
output "vpc_cidr" {
description = "VPC CIDR block"
value = aws_vpc.this.cidr_block
}
output "public_subnet_ids" {
description = "List of public subnet IDs"
value = aws_subnet.public[*].id
}
output "private_subnet_ids" {
description = "List of private subnet IDs"
value = aws_subnet.private[*].id
}
output "nat_gateway_ids" {
description = "List of NAT Gateway IDs"
value = aws_nat_gateway.this[*].id
}
output "igw_id" {
description = "Internet Gateway ID"
value = aws_internet_gateway.this.id
}
Composition Module: Complete Application Infrastructure
# modules/app-stack/main.tf
module "vpc" {
source = "../vpc"
cidr_block = var.vpc_cidr
environment = var.environment
public_subnets = var.public_subnet_cidrs
private_subnets = var.private_subnet_cidrs
enable_nat_gateway = true
single_nat_gateway = var.environment != "prod"
tags = local.common_tags
}
module "rds" {
source = "../rds"
environment = var.environment
vpc_id = module.vpc.vpc_id
subnet_ids = module.vpc.private_subnet_ids
engine = var.db_engine
engine_version = var.db_engine_version
instance_class = var.db_instance_class
allocated_storage = var.db_allocated_storage
database_name = var.database_name
username = var.db_username
password = var.db_password
tags = local.common_tags
}
module "ecs" {
source = "../ecs"
environment = var.environment
vpc_id = module.vpc.vpc_id
subnet_ids = module.vpc.private_subnet_ids
cluster_name = "${var.environment}-cluster"
container_image = var.container_image
container_port = var.container_port
desired_count = var.desired_count
cpu = var.cpu
memory = var.memory
environment_variables = merge(
{
DATABASE_URL = "postgresql://${var.db_username}:${var.db_password}@${module.rds.endpoint}/${var.database_name}"
ENVIRONMENT = var.environment
},
var.extra_environment_variables
)
tags = local.common_tags
}
locals {
common_tags = merge(
{
Environment = var.environment
Project = var.project_name
ManagedBy = "terraform"
},
var.tags
)
}
Module Versioning
# Using Terraform Registry modules
module "vpc" {
source = "terraform-aws-modules/vpc/aws"
version = "~> 5.0"
}
# Using Git repository modules (with tags)
module "app_stack" {
source = "git::https://github.com/myorg/terraform-modules.git//modules/app-stack?ref=v2.1.0"
}
# Using local modules (development phase)
module "vpc" {
source = "../../modules/vpc"
}
# Using S3 stored module packages
module "app_stack" {
source = "s3::https://my-terraform-modules.s3.amazonaws.com/app-stack/v2.1.0.zip"
}
Module Testing
# modules/vpc/tests/main.tftest.hcl
run "validate_vpc_cidr" {
command = plan
variables {
cidr_block = "10.0.0.0/16"
environment = "test"
}
assert {
condition = aws_vpc.this.cidr_block == "10.0.0.0/16"
error_message = "VPC CIDR block should match input"
}
}
run "validate_subnets" {
command = plan
variables {
cidr_block = "10.0.0.0/16"
environment = "test"
public_subnets = ["10.0.1.0/24", "10.0.2.0/24"]
private_subnets = ["10.0.10.0/24", "10.0.11.0/24"]
}
assert {
condition = length(aws_subnet.public) == 2
error_message = "Should create 2 public subnets"
}
assert {
condition = length(aws_subnet.private) == 2
error_message = "Should create 2 private subnets"
}
}
run "validate_nat_gateway_production" {
command = plan
variables {
cidr_block = "10.0.0.0/16"
environment = "prod"
enable_nat_gateway = true
single_nat_gateway = false
private_subnets = ["10.0.10.0/24", "10.0.11.0/24", "10.0.12.0/24"]
}
assert {
condition = length(aws_nat_gateway.this) == 3
error_message = "Production should have one NAT Gateway per AZ"
}
}
# Run module tests
cd modules/vpc
terraform test
# Run all module tests
terraform test -recursive
Pattern 2: State Management
Terraform state files are the most critical data in IaC. Losing state means losing control over your infrastructure. Production environments must use remote backends, enable state locking, and regularly detect drift.
Remote Backend Architecture
┌──────────────────────────────────────────────────────┐
│ State Management 3-Layer Protection │
│ │
│ ┌──────────────────────────────────────────────┐ │
│ │ Layer 1: Remote Backend │ │
│ │ S3 + DynamoDB / GCS / Azure Blob │ │
│ │ (State persistence, team sharing) │ │
│ └──────────────────────────────────────────────┘ │
│ │
│ ┌──────────────────────────────────────────────┐ │
│ │ Layer 2: State Locking │ │
│ │ DynamoDB / GCS native lock / Azure Blob lease │ │
│ │ (Prevent concurrent modifications) │ │
│ └──────────────────────────────────────────────┘ │
│ │
│ ┌──────────────────────────────────────────────┐ │
│ │ Layer 3: Drift Detection │ │
│ │ terraform refresh + CI/CD scheduled checks │ │
│ │ (Detect manual changes, maintain consistency)│ │
│ └──────────────────────────────────────────────┘ │
└──────────────────────────────────────────────────────┘
S3 + DynamoDB Backend Configuration
# backend.tf
terraform {
backend "s3" {
bucket = "myorg-terraform-state"
key = "app-infra/terraform.tfstate"
region = "us-east-1"
encrypt = true
dynamodb_table = "terraform-locks"
kms_key_id = "arn:aws:kms:us-east-1:123456789012:key/abc123"
state_lock_timeout = "30m"
}
}
Bootstrapping Backend Infrastructure
# bootstrap/main.tf
resource "aws_s3_bucket" "terraform_state" {
bucket = "myorg-terraform-state"
lifecycle {
prevent_destroy = true
}
}
resource "aws_s3_bucket_versioning" "terraform_state" {
bucket = aws_s3_bucket.terraform_state.id
versioning_configuration {
status = "Enabled"
}
}
resource "aws_s3_bucket_server_side_encryption_configuration" "terraform_state" {
bucket = aws_s3_bucket.terraform_state.id
rule {
apply_server_side_encryption_by_default {
sse_algorithm = "aws:kms"
kms_master_key_id = aws_kms_key.terraform_state.arn
}
}
}
resource "aws_s3_bucket_public_access_block" "terraform_state" {
bucket = aws_s3_bucket.terraform_state.id
block_public_acls = true
block_public_policy = true
ignore_public_acls = true
restrict_public_buckets = true
}
resource "aws_s3_bucket_lifecycle_configuration" "terraform_state" {
bucket = aws_s3_bucket.terraform_state.id
rule {
id = "cleanup-old-versions"
status = "Enabled"
noncurrent_version_transition {
noncurrent_days = 90
storage_class = "GLACIER"
}
noncurrent_version_expiration {
noncurrent_days = 365
}
}
}
resource "aws_dynamodb_table" "terraform_locks" {
name = "terraform-locks"
billing_mode = "PAY_PER_REQUEST"
hash_key = "LockID"
attribute {
name = "LockID"
type = "S"
}
}
resource "aws_kms_key" "terraform_state" {
description = "Terraform state encryption key"
deletion_window_in_days = 30
enable_key_rotation = true
}
resource "aws_kms_alias" "terraform_state" {
name = "alias/terraform-state"
target_key_id = aws_kms_key.terraform_state.key_id
}
# Bootstrap process
cd bootstrap
terraform init
terraform apply
# Migrate to remote backend
# After creating backend.tf, run:
terraform init -migrate-state
# Verify state migrated to S3
aws s3 ls s3://myorg-terraform-state/app-infra/
GCS Backend Configuration
terraform {
backend "gcs" {
bucket = "myorg-terraform-state"
prefix = "app-infra"
}
}
State Layering
┌──────────────────────────────────────────────────────┐
│ State Layering Architecture │
│ │
│ ┌──────────────────────────────────────────────┐ │
│ │ Layer 0: bootstrap (local state) │ │
│ │ S3 Bucket / DynamoDB / KMS / IAM │ │
│ └──────────────────────────────────────────────┘ │
│ │ data flow │
│ ┌──────────────────────────────────────────────┐ │
│ │ Layer 1: networking (remote state) │ │
│ │ VPC / Subnets / Route Tables / NAT GW │ │
│ └──────────────────────────────────────────────┘ │
│ │ data flow │
│ ┌──────────────────────────────────────────────┐ │
│ │ Layer 2: compute (remote state) │ │
│ │ ECS / EKS / RDS / ElastiCache │ │
│ └──────────────────────────────────────────────┘ │
│ │ data flow │
│ ┌──────────────────────────────────────────────┐ │
│ │ Layer 3: services (remote state) │ │
│ │ DNS / CDN / Monitoring / Alerts │ │
│ └──────────────────────────────────────────────┘ │
└──────────────────────────────────────────────────────┘
# Layer 2 references Layer 1 outputs
data "terraform_remote_state" "networking" {
backend = "s3"
config = {
bucket = "myorg-terraform-state"
key = "networking/terraform.tfstate"
region = "us-east-1"
}
}
module "ecs" {
source = "../../modules/ecs"
vpc_id = data.terraform_remote_state.networking.outputs.vpc_id
subnet_ids = data.terraform_remote_state.networking.outputs.private_subnet_ids
}
Drift Detection
# Manual drift detection
terraform plan -detailed-exitcode
# 0 = no changes
# 1 = error
# 2 = changes exist (drift present)
# Refresh state
terraform refresh
# .github/workflows/drift-detection.yml
name: Drift Detection
on:
schedule:
- cron: "0 8 * * 1-5"
workflow_dispatch:
jobs:
drift-detection:
runs-on: ubuntu-latest
steps:
- uses: actions/checkout@v4
- name: Setup Terraform
uses: hashicorp/setup-terraform@v3
with:
terraform_version: "1.10.0"
- name: Terraform Init
run: terraform init -backend-config=backend.hcl
- name: Check Drift
run: |
terraform plan -detailed-exitcode -out=plan.out || exit_code=$?
if [ "${exit_code:-0}" -eq 2 ]; then
echo "::warning::Infrastructure drift detected!"
terraform show -json plan.out | jq -r '.resource_changes[] | select(.change.actions != ["no-op"]) | "\(.type).\(.name): \(.change.actions | join(", "))"'
exit 1
fi
State Operation Commands
# List current state
terraform state list
# Show specific resource state
terraform state show aws_vpc.this
# Move resource (during refactoring)
terraform state mv aws_vpc.this aws_vpc.main
# Remove resource (no longer managed by Terraform)
terraform state rm aws_vpc.this
# Import existing resource
terraform import aws_vpc.this vpc-abc123
# Force unlock (when state lock is stuck)
terraform force-unlock <lock-id>
# Pull remote state to local
terraform state pull > state.json
# Push local state to remote
terraform state push state.json
Pattern 3: Workspace Environment Isolation
Multi-environment management is the most basic IaC requirement. dev/staging/prod environments share modules but have different configurations. Terraform Workspace provides lightweight environment isolation, but needs proper variable management to be effective.
Workspace vs Directory Isolation
┌─────────────────────────────────────────────────────────────┐
│ Environment Isolation: Two Approaches │
├──────────────────────┬──────────────────────────────────────┤
│ Workspace Isolation │ Directory Isolation │
├──────────────────────┼──────────────────────────────────────┤
│ Single state file/ │ Independent state file per │
│ environment │ environment │
│ Same codebase │ Independent code directory per │
│ │ environment │
│ workspace switch │ directory switch │
│ Simple environments │ Complex environments │
│ State path: │ State path: │
│ env:/dev/state │ dev/terraform.tfstate │
│ env:/prod/state │ prod/terraform.tfstate │
└──────────────────────┴──────────────────────────────────────┘
Recommended Approach: Directory Isolation + Shared Modules
infra/
├── modules/
│ ├── vpc/
│ ├── rds/
│ └── ecs/
├── environments/
│ ├── dev/
│ │ ├── main.tf
│ │ ├── variables.tf
│ │ ├── outputs.tf
│ │ ├── backend.tf
│ │ └── terraform.tfvars
│ ├── staging/
│ │ ├── main.tf
│ │ ├── variables.tf
│ │ ├── outputs.tf
│ │ ├── backend.tf
│ │ └── terraform.tfvars
│ └── prod/
│ ├── main.tf
│ ├── variables.tf
│ ├── outputs.tf
│ ├── backend.tf
│ └── terraform.tfvars
└── shared/
└── locals.tf
Environment Configuration
# environments/dev/backend.tf
terraform {
backend "s3" {
bucket = "myorg-terraform-state"
key = "dev/app-infra/terraform.tfstate"
region = "us-east-1"
encrypt = true
dynamodb_table = "terraform-locks"
}
}
# environments/dev/main.tf
module "vpc" {
source = "../../modules/vpc"
cidr_block = "10.0.0.0/16"
environment = "dev"
public_subnets = ["10.0.1.0/24"]
private_subnets = ["10.0.10.0/24"]
enable_nat_gateway = true
single_nat_gateway = true
}
module "rds" {
source = "../../modules/rds"
environment = "dev"
vpc_id = module.vpc.vpc_id
subnet_ids = module.vpc.private_subnet_ids
engine = "postgres"
engine_version = "16.4"
instance_class = "db.t3.micro"
allocated_storage = 20
database_name = "appdb_dev"
username = "appadmin"
password = var.db_password
}
module "ecs" {
source = "../../modules/ecs"
environment = "dev"
vpc_id = module.vpc.vpc_id
subnet_ids = module.vpc.private_subnet_ids
container_image = "123456789012.dkr.ecr.us-east-1.amazonaws.com/app:dev-latest"
desired_count = 1
cpu = 256
memory = 512
}
# environments/prod/main.tf
module "vpc" {
source = "../../modules/vpc"
cidr_block = "10.100.0.0/16"
environment = "prod"
public_subnets = ["10.100.1.0/24", "10.100.2.0/24", "10.100.3.0/24"]
private_subnets = ["10.100.10.0/24", "10.100.11.0/24", "10.100.12.0/24"]
enable_nat_gateway = true
single_nat_gateway = false
}
module "rds" {
source = "../../modules/rds"
environment = "prod"
vpc_id = module.vpc.vpc_id
subnet_ids = module.vpc.private_subnet_ids
engine = "postgres"
engine_version = "16.4"
instance_class = "db.r6g.xlarge"
allocated_storage = 500
database_name = "appdb_prod"
username = "appadmin"
password = var.db_password
multi_az = true
}
module "ecs" {
source = "../../modules/ecs"
environment = "prod"
vpc_id = module.vpc.vpc_id
subnet_ids = module.vpc.private_subnet_ids
container_image = "123456789012.dkr.ecr.us-east-1.amazonaws.com/app:v2.1.0"
desired_count = 3
cpu = 1024
memory = 2048
}
Variable Management
# environments/dev/terraform.tfvars
environment = "dev"
aws_region = "us-east-1"
db_password = "dev-password-change-me"
instance_type = "t3.micro"
desired_count = 1
# environments/prod/terraform.tfvars
environment = "prod"
aws_region = "us-east-1"
db_password = "" # Get from environment variable or Vault
instance_type = "r6g.xlarge"
desired_count = 3
Variable Validation
# variables.tf
variable "environment" {
description = "Environment name"
type = string
validation {
condition = contains(["dev", "staging", "prod"], var.environment)
error_message = "Environment must be one of: dev, staging, prod."
}
}
variable "db_password" {
description = "Database password"
type = string
sensitive = true
validation {
condition = length(var.db_password) >= 16
error_message = "Database password must be at least 16 characters."
}
}
variable "allowed_cidrs" {
description = "List of CIDR blocks allowed to access the application"
type = list(string)
validation {
condition = length(var.allowed_cidrs) > 0
error_message = "At least one CIDR block must be specified."
}
}
Pattern 4: OpenTofu Migration
In August 2023, HashiCorp changed Terraform's license from Mozilla Public License 2.0 to Business Source License 1.1. BSL 1.1 prohibits "competitive use" — including using Terraform to provide competitive IaC services. This isn't a problem for most enterprises, but the open-source community's response birthed OpenTofu.
Migration Assessment
| Dimension | Terraform (BSL 1.1) | OpenTofu (MPL 2.0) |
|---|---|---|
| License | BSL 1.1 (competitive use restriction) | MPL 2.0 (fully open source) |
| CLI Compatibility | Native | 100% compatible with Terraform 1.6 |
| Provider Compatibility | Native | Compatible with all community providers |
| State File | Native format | 100% compatible |
| Enterprise Support | HashiCorp support | Linux Foundation community |
| New Features | 1.10+ native testing | 1.9+ encrypted state |
| Registry | Terraform Registry | OpenTofu Registry |
Migration Steps
# Step 1: Install OpenTofu
# macOS
brew install opentofu
# Linux
curl -fsSL https://get.opentofu.org/install-opentofu.sh | bash
# Verify version compatibility
tofu version
# OpenTofu v1.9.0
# Step 2: Replace CLI commands
# terraform init → tofu init
# terraform plan → tofu plan
# terraform apply → tofu apply
# terraform destroy → tofu destroy
# Step 3: Verify compatibility
tofu init
tofu plan
# If plan output matches terraform plan, migration successful
# Step 4: Update CI/CD configuration
# Replace all terraform commands with tofu
CI/CD Migration Example
# .github/workflows/tofu.yml
name: OpenTofu CI/CD
on:
push:
branches: [main]
pull_request:
branches: [main]
jobs:
plan:
runs-on: ubuntu-latest
steps:
- uses: actions/checkout@v4
- name: Setup OpenTofu
uses: opentofu/setup-opentofu@v1
with:
tofu_version: "1.9.0"
- name: Tofu Init
run: tofu init -backend-config=backend.hcl
working-directory: environments/${{ matrix.environment }}
- name: Tofu Plan
run: tofu plan -out=plan.out
working-directory: environments/${{ matrix.environment }}
- name: Upload Plan
uses: actions/upload-artifact@v4
with:
name: plan-${{ matrix.environment }}
path: environments/${{ matrix.environment }}/plan.out
apply:
needs: plan
runs-on: ubuntu-latest
if: github.ref == 'refs/heads/main'
environment: production
steps:
- uses: actions/checkout@v4
- name: Setup OpenTofu
uses: opentofu/setup-opentofu@v1
with:
tofu_version: "1.9.0"
- name: Download Plan
uses: actions/download-artifact@v4
with:
name: plan-${{ matrix.environment }}
- name: Tofu Apply
run: tofu apply plan.out
working-directory: environments/${{ matrix.environment }}
OpenTofu Exclusive Feature: Encrypted State
# OpenTofu 1.9+ supports native state encryption
terraform {
encryption {
key_provider "pbkdf2" "mykey" {
passphrase = var.encryption_passphrase
key_length = 32
iterations = 600000
salt = "fixed-salt-for-key-derivation"
}
method "aes_gcm" "myencryption" {
keys = key_provider.pbkdf2.mykey
}
state {
method = method.aes_gcm.myencryption
fallback {
method = method.aes_gcm.myencryption
}
}
plan {
method = method.aes_gcm.myencryption
fallback {
method = method.aes_gcm.myencryption
}
}
}
}
Progressive Migration Strategy
┌──────────────────────────────────────────────────────┐
│ Progressive Migration Roadmap │
│ │
│ Phase 1: Assessment (1-2 weeks) │
│ ├── Inventory all Terraform projects │
│ ├── Check Provider and Module compatibility │
│ └── Define migration priority │
│ │
│ Phase 2: Non-production Migration (2-4 weeks) │
│ ├── Switch dev/staging to OpenTofu │
│ ├── Verify plan output consistency │
│ └── Update CI/CD pipelines │
│ │
│ Phase 3: Production Migration (1-2 weeks) │
│ ├── Switch production to OpenTofu │
│ ├── Enable state encryption │
│ └── Monitor for 1 week to confirm stability │
│ │
│ Phase 4: Cleanup (1 week) │
│ ├── Remove Terraform CLI dependencies │
│ ├── Update documentation and runbooks │
│ └── Complete team training │
└──────────────────────────────────────────────────────┘
Pattern 5: GitOps Integration
The ultimate goal of Terraform IaC is GitOps: code commits trigger plans, and after approval, apply automatically. Atlantis is currently the most mature Terraform GitOps tool, executing terraform plan and terraform apply directly within Pull Requests.
Atlantis Architecture
┌──────────────────────────────────────────────────────┐
│ Atlantis GitOps Architecture │
│ │
│ ┌──────────┐ webhook ┌──────────────────┐ │
│ │ GitHub │───────────────▶│ Atlantis │ │
│ │ /GitLab │ │ Server │ │
│ │ │◀───────────────│ │ │
│ │ │ PR Comment │ ┌──────────────┐│ │
│ │ │ (plan/apply) │ │ terraform ││ │
│ └──────────┘ │ │ plan/apply ││ │
│ │ └──────────────┘│ │
│ └────────┬─────────┘ │
│ │ │
│ ┌────────▼─────────┐ │
│ │ AWS / GCP / │ │
│ │ Azure API │ │
│ └──────────────────┘ │
└──────────────────────────────────────────────────────┘
Deploying Atlantis
# atlantis-deployment.yaml
apiVersion: apps/v1
kind: Deployment
metadata:
name: atlantis
namespace: atlantis
spec:
replicas: 1
selector:
matchLabels:
app: atlantis
template:
metadata:
labels:
app: atlantis
spec:
containers:
- name: atlantis
image: ghcr.io/runatlantis/atlantis:v0.30.0
ports:
- containerPort: 4141
env:
- name: ATLANTIS_GH_USER
value: "myorg-bot"
- name: ATLANTIS_GH_TOKEN
valueFrom:
secretKeyRef:
name: atlantis-secrets
key: github-token
- name: ATLANTIS_GH_WEBHOOK_SECRET
valueFrom:
secretKeyRef:
name: atlantis-secrets
key: webhook-secret
- name: ATLANTIS_ALLOW_REPO_CONFIG
value: "true"
- name: ATLANTIS_PARALLEL_PLAN_COUNT
value: "4"
- name: ATLANTIS_PARALLEL_APPLY_COUNT
value: "2"
- name: ATLANTIS_AUTOPLAN_ENABLED
value: "true"
- name: ATLANTIS_REPO_CONFIG_JSON
value: |
{
"repos": [
{
"id": "/.*/",
"apply_requirements": ["approved", "mergeable"],
"plan_requirements": [],
"import_requirements": [],
"allowed_overrides": ["apply_requirements", "workflow"],
"allow_custom_workflows": true
}
]
}
resources:
requests:
cpu: "500m"
memory: "1Gi"
limits:
cpu: "2"
memory: "4Gi"
volumeMounts:
- name: atlantis-data
mountPath: /home/atlantis
- name: repo-config
mountPath: /etc/atlantis
volumes:
- name: atlantis-data
persistentVolumeClaim:
claimName: atlantis-data
- name: repo-config
configMap:
name: atlantis-config
Atlantis Repository Configuration
# atlantis.yaml (project root)
version: 3
projects:
- name: dev-infra
dir: environments/dev
workflow: terraform
autoplan:
when_modified: ["../../modules/**/*.tf", "*.tf", "*.tfvars"]
enabled: true
apply_requirements:
- approved
- mergeable
- name: staging-infra
dir: environments/staging
workflow: terraform
autoplan:
when_modified: ["../../modules/**/*.tf", "*.tf", "*.tfvars"]
enabled: true
apply_requirements:
- approved
- mergeable
- name: prod-infra
dir: environments/prod
workflow: terraform
autoplan:
when_modified: ["../../modules/**/*.tf", "*.tf", "*.tfvars"]
enabled: true
apply_requirements:
- approved
- mergeable
- undiverged
workflows:
terraform:
plan:
steps:
- env:
name: TF_VAR_db_password
value: ${DB_PASSWORD}
- run: terraform init -backend-config=backend.hcl -reconfigure
- run: terraform plan -out=$PLANFILE -var-file=terraform.tfvars
- run: terraform show -json $PLANFILE > $SHOWFILE
apply:
steps:
- run: terraform apply $PLANFILE
CI/CD Pipeline (Without Atlantis)
# .github/workflows/terraform-cicd.yml
name: Terraform CI/CD
on:
push:
branches: [main]
paths:
- "environments/**"
- "modules/**"
pull_request:
branches: [main]
paths:
- "environments/**"
- "modules/**"
env:
TF_VERSION: "1.10.0"
AWS_REGION: "us-east-1"
jobs:
detect-changes:
runs-on: ubuntu-latest
outputs:
environments: ${{ steps.changes.outputs.environments }}
steps:
- uses: actions/checkout@v4
- name: Detect changed environments
id: changes
run: |
if [ "${{ github.event_name }}" = "pull_request" ]; then
CHANGED=$(git diff --name-only origin/main...HEAD | grep -oP 'environments/\K[^/]+' | sort -u | jq -R . | jq -s .)
else
CHANGED=$(git diff --name-only HEAD~1 HEAD | grep -oP 'environments/\K[^/]+' | sort -u | jq -R . | jq -s .)
fi
echo "environments=$CHANGED" >> $GITHUB_OUTPUT
plan:
needs: detect-changes
if: needs.detect-changes.outputs.environments != '[]'
runs-on: ubuntu-latest
strategy:
matrix:
environment: ${{ fromJson(needs.detect-changes.outputs.environments) }}
steps:
- uses: actions/checkout@v4
- name: Setup Terraform
uses: hashicorp/setup-terraform@v3
with:
terraform_version: ${{ env.TF_VERSION }}
- name: Configure AWS Credentials
uses: aws-actions/configure-aws-credentials@v4
with:
role-to-assume: arn:aws:iam::123456789012:role/terraform-ci
aws-region: ${{ env.AWS_REGION }}
- name: Terraform Init
run: terraform init -backend-config=backend.hcl
working-directory: environments/${{ matrix.environment }}
- name: Terraform Plan
run: |
terraform plan -out=plan.out -var-file=terraform.tfvars
terraform show -json plan.out > plan.json
working-directory: environments/${{ matrix.environment }}
- name: Upload Plan Artifact
uses: actions/upload-artifact@v4
with:
name: plan-${{ matrix.environment }}
path: |
environments/${{ matrix.environment }}/plan.out
environments/${{ matrix.environment }}/plan.json
- name: Comment PR with Plan
if: github.event_name == 'pull_request'
uses: actions/github-script@v7
with:
script: |
const fs = require('fs');
const plan = fs.readFileSync('environments/${{ matrix.environment }}/plan.json', 'utf8');
const planObj = JSON.parse(plan);
const changes = planObj.resource_changes.filter(c => c.change.actions.some(a => a !== 'no-op'));
let body = `## Terraform Plan: ${{ matrix.environment }}\n\n`;
body += `| Action | Resource Type | Resource Name |\n|--------|--------------|---------------|\n`;
for (const c of changes) {
body += `| ${c.change.actions.join(', ')} | ${c.type} | ${c.name} |\n`;
}
github.rest.issues.createComment({
issue_number: context.issue.number,
owner: context.repo.owner,
repo: context.repo.repo,
body: body
});
apply:
needs: [detect-changes, plan]
if: github.ref == 'refs/heads/main' && github.event_name == 'push'
runs-on: ubuntu-latest
strategy:
matrix:
environment: ${{ fromJson(needs.detect-changes.outputs.environments) }}
environment: ${{ matrix.environment }}
steps:
- uses: actions/checkout@v4
- name: Setup Terraform
uses: hashicorp/setup-terraform@v3
with:
terraform_version: ${{ env.TF_VERSION }}
- name: Configure AWS Credentials
uses: aws-actions/configure-aws-credentials@v4
with:
role-to-assume: arn:aws:iam::123456789012:role/terraform-ci
aws-region: ${{ env.AWS_REGION }}
- name: Download Plan
uses: actions/download-artifact@v4
with:
name: plan-${{ matrix.environment }}
- name: Terraform Apply
run: terraform apply plan.out
working-directory: environments/${{ matrix.environment }}
Policy as Code: Sentinel / OPA
# sentinel/require-tags.sentinel
import "tfplan/v2" as tfplan
allResources = filter tfplan.resource_changes as _, rc {
rc.mode == "managed" and
rc.type != "null_resource" and
rc.change.actions != ["delete"]
}
tagsRequired = rule {
all allResources as _, rc {
rc.change.after.tags contains "Environment" and
rc.change.after.tags contains "ManagedBy"
}
}
main = rule {
tagsRequired
}
# opa/require-tags.rego
package terraform
import future.keywords.if
import future.keywords.in
deny[msg] if {
some rc in input.resource_changes
rc.mode == "managed"
rc.change.actions[_] != "delete"
not "Environment" in object.keys(rc.change.after.tags)
msg := sprintf("Resource %s of type %s missing Environment tag", [rc.name, rc.type])
}
deny[msg] if {
some rc in input.resource_changes
rc.mode == "managed"
rc.change.actions[_] != "delete"
not "ManagedBy" in object.keys(rc.change.after.tags)
msg := sprintf("Resource %s of type %s missing ManagedBy tag", [rc.name, rc.type])
}
# Use OPA to check Terraform Plan
terraform plan -out=plan.out
terraform show -json plan.out > plan.json
# Run OPA policy check
opa eval --data opa/ --input plan.json "data.terraform.deny"
5 Common Pitfalls and Solutions
Pitfall 1: State File Corruption
Symptom: terraform plan reports state snapshot was created by a newer version or invalid character.
Cause: State file manually edited, disk failure, or S3 version rollback.
Solution:
# 1. Restore from S3 version history
aws s3api list-object-versions \
--bucket myorg-terraform-state \
--prefix dev/app-infra/terraform.tfstate
# Restore to previous version
aws s3api copy-object \
--bucket myorg-terraform-state \
--copy-source myorg-terraform-state/dev/app-infra/terraform.tfstate?versionId=PREVIOUS_VERSION \
--key dev/app-infra/terraform.tfstate
# 2. Force pull and repair
terraform state pull > state.json
# Manually repair JSON (proceed with caution)
terraform state push state.json
Pitfall 2: Provider Version Locking Causes CI Failures
Symptom: Local terraform plan works fine, but CI/CD fails with Provider download errors or version incompatibility.
Cause: Local cache exists; CI environment installs fresh each time. Provider versions not locked.
Solution:
# versions.tf - Lock Provider versions
terraform {
required_version = ">= 1.5.0, < 2.0.0"
required_providers {
aws = {
source = "hashicorp/aws"
version = "5.80.0"
}
}
}
# Commit lock file to Git
git add .terraform.lock.hcl
git commit -m "chore: lock provider versions"
Pitfall 3: Circular Dependencies
Symptom: terraform plan reports Cycle: module.x, module.y.
Cause: Module A's output depends on Module B's output, and Module B depends on Module A's output.
Solution:
# Wrong: Circular dependency
module "vpc" {
source = "./vpc"
ecs_security_group_id = module.ecs.security_group_id
}
module "ecs" {
source = "./ecs"
subnet_ids = module.vpc.private_subnet_ids
}
# Correct: Split into 3 layers, unidirectional dependencies
# Layer 1: Networking
module "vpc" {
source = "./vpc"
}
# Layer 2: Security Groups
module "security_groups" {
source = "./security-groups"
vpc_id = module.vpc.vpc_id
}
# Layer 3: Compute
module "ecs" {
source = "./ecs"
subnet_ids = module.vpc.private_subnet_ids
security_group_ids = module.security_groups.app_ids
}
Pitfall 4: Sensitive Variables Leaked to State File
Symptom: terraform show reveals plaintext passwords; state file contains sensitive information.
Cause: sensitive = true only hides CLI output, doesn't encrypt values in the state file.
Solution:
# Option 1: Use AWS SSM Parameter Store
data "aws_ssm_parameter" "db_password" {
name = "/app/${var.environment}/db-password"
with_decryption = true
}
module "rds" {
source = "../../modules/rds"
password = data.aws_ssm_parameter.db_password.value
}
# Option 2: Use AWS Secrets Manager
data "aws_secretsmanager_secret_version" "db_creds" {
secret_id = "app/${var.environment}/db-credentials"
}
module "rds" {
source = "../../modules/rds"
password = jsondecode(data.aws_secretsmanager_secret_version.db_creds.secret_string)["password"]
}
# Option 3: Environment variable injection (CI/CD scenario)
# export TF_VAR_db_password=$(vault read -field=password secret/data/app/prod/db)
Pitfall 5: Module Refactoring Causes Resource Recreation
Symptom: After renaming a module or moving resources, terraform plan shows resources will be destroyed and recreated.
Cause: Terraform identifies resources by their address (module.vpc.aws_vpc.this). When the address changes, Terraform treats it as a new resource.
Solution:
# Before refactoring: move state first
terraform state mv module.vpc module.networking.vpc
terraform state mv module.vpc.aws_vpc.this module.networking.aws_vpc.this
# Then modify code
# Move module files
mv modules/vpc modules/networking/vpc
# Update module references
# module "vpc" → module "networking_vpc"
# Verify
terraform plan # Should show no changes
10 Common Error Troubleshooting
1. Error: Failed to load plugin
# Clear plugin cache and re-download
rm -rf .terraform/providers
terraform init -upgrade
# Check network proxy
export HTTPS_PROXY=http://proxy.internal:8080
terraform init
2. Error: Error locking state: Error acquiring the state lock
# Check DynamoDB locks
aws dynamodb scan --table-name terraform-locks
# Confirm no other process is running
# If lock is stale, force unlock
terraform force-unlock <lock-id>
3. Error: Provider produced inconsistent result after apply
# This is a Provider bug, usually resolved by:
# 1. Update Provider version
terraform init -upgrade
# 2. If known Provider bug, use lifecycle to ignore changes
resource "aws_instance" "app" {
lifecycle {
ignore_changes = [user_data_replace_on_change]
}
}
4. Error: Resource already managed by Terraform
# Resource exists in state but removed from code
# List resources in state
terraform state list
# Remove from state
terraform state rm aws_instance.old_resource
5. Error: Module not found
# Clear module cache
rm -rf .terraform/modules
terraform init -upgrade
# Check module source path
# Local module paths are relative to the current tf file
module "vpc" {
source = "../../modules/vpc"
}
6. Error: Invalid for_each argument
# Wrong: for_each value unknown at plan time
resource "aws_subnet" "private" {
for_each = toset(module.vpc.private_subnet_cidrs)
}
# Correct: Use known values
variable "private_subnets" {
type = list(string)
}
resource "aws_subnet" "private" {
for_each = toset(var.private_subnets)
}
7. Error: Value for unconfigurable attribute
# Wrong: Trying to set read-only attribute
resource "aws_eip" "nat" {
instance = aws_instance.nat.id
domain = "vpc"
}
# Correct: Check Provider docs, only set writable attributes
resource "aws_eip" "nat" {
domain = "vpc"
}
8. Error: Backend configuration changed
# Re-initialize after backend config change
terraform init -migrate-state
# If migration fails, manually migrate
terraform state pull > state.json
# Modify backend.tf
terraform init
terraform state push state.json
9. Error: Invalid terraform configuration: No required_providers
# Every module must declare required_providers
terraform {
required_providers {
aws = {
source = "hashicorp/aws"
version = ">= 5.0"
}
}
}
10. Error: Incompatible API version
# Provider version incompatible with Terraform version
# Check compatibility
terraform version
terraform providers
# Update to compatible version
terraform {
required_version = ">= 1.5.0"
required_providers {
aws = {
source = "hashicorp/aws"
version = "5.80.0"
}
}
}
Advanced Optimization Tips
1. Terraform Cloud / Enterprise
# terraform-cloud.tf
terraform {
cloud {
organization = "myorg"
workspaces {
tags = ["app-infra"]
}
}
}
2. Terraform Test Framework
# tests/integration/main.tftest.hcl
run "create_infrastructure" {
command = apply
module {
source = "../../environments/dev"
}
variables {
db_password = "test-password-12345678"
}
}
run "validate_endpoints" {
command = apply
variables {
api_endpoint = run.create_infrastructure.api_url
}
assert {
condition = can(http_request.check.status_code == 200)
error_message = "API endpoint should return 200"
}
}
3. Cost Estimation with Infracost
# Estimate costs using Infracost
infracost breakdown --path=plan.json \
--format=json \
--out-file=infracost.json
# Add cost comment to PR
infracost comment github --path=infracost.json \
--behavior=update
# .github/workflows/infracost.yml
name: Infracost
on:
pull_request:
paths:
- "environments/**"
- "modules/**"
jobs:
infracost:
runs-on: ubuntu-latest
steps:
- uses: actions/checkout@v4
- name: Setup Infracost
uses: infracost/actions/setup@v3
with:
api-key: ${{ secrets.INFRACOST_API_KEY }}
- name: Terraform Plan
run: |
terraform init
terraform plan -out=plan.out -var-file=terraform.tfvars
terraform show -json plan.out > plan.json
working-directory: environments/dev
- name: Infracost Breakdown
run: infracost breakdown --path=plan.json --format=json --out-file=/tmp/infracost.json
working-directory: environments/dev
- name: Infracost Comment
run: infracost comment github --path=/tmp/infracost.json --behavior=update --pull-request=${{ github.event.pull_request.number }}
4. Module Documentation Auto-Generation
# Install terraform-docs
brew install terraform-docs
# Generate README
terraform-docs markdown table ./modules/vpc > ./modules/vpc/README.md
# .pre-commit-config.yaml
repos:
- repo: https://github.com/antonbabenko/pre-commit-terraform
rev: v1.92.0
hooks:
- id: terraform_fmt
- id: terraform_validate
- id: terraform_docs
args:
- '--args=--lockfile=false'
- id: terraform_tflint
- id: terraform_trivy
- id: terraform_checkov
Comparison: Terraform vs OpenTofu vs Pulumi vs CDK
| Dimension | Terraform | OpenTofu | Pulumi | CDK |
|---|---|---|---|---|
| Language | HCL | HCL | TS/Python/Go/C# | TS/Python/Java/C# |
| License | BSL 1.1 | MPL 2.0 | Apache 2.0 | Apache 2.0 |
| Providers | 3000+ | 3000+ | 200+ | AWS-primary |
| State Encryption | S3 KMS | Native encryption | Pulumi Cloud | CDK Cloud |
| Testing | terraform test | terraform test | Mocha/Jest | Jest |
| GitOps | Atlantis | Atlantis | Pulumi Cloud | CDK Pipelines |
| Learning Curve | Low | Low | Low | Medium |
| Multi-cloud | Native | Native | Native | AWS-primary |
| Community | Largest | Growing | Growing | AWS ecosystem |
| Enterprise Support | HashiCorp | Linux Foundation | Pulumi Corp | AWS |
Decision Tree
Is the team familiar with HCL?
├── Yes → Concerned about license compliance?
│ ├── Yes → OpenTofu
│ └── No → Terraform
└── No → Primarily using AWS?
├── Yes → CDK
└── No → Prefer general-purpose programming languages?
├── Yes → Pulumi
└── No → Terraform/OpenTofu (HCL is easy to learn)
Recommended Online Tools
- JSON Formatter: /en/json/format — Format Terraform state files and plan output
- Base64 Encoder: /en/encode/base64 — Handle Base64 encoded data in Terraform
- Hash Calculator: /en/encode/hash — Calculate config file hashes and verify state file integrity
Summary: The core of Terraform IaC best practices lies in 5 production patterns — module composition design makes code reusable and testable, remote state management ensures data safety and reliability, workspace environment isolation enables multi-environment management, OpenTofu migration resolves license compliance, and GitOps integration automates plan/apply. In 2026, whether you choose Terraform or OpenTofu, HCL remains the most mature choice in the IaC space. Key practices: 3-layer module architecture, S3+DynamoDB remote backend, directory-based environment isolation, Atlantis GitOps, and Policy as Code. IaC is not a one-time project — it's a continuously evolving platform.
Related Articles:
- GitOps with ArgoCD Production Practices — Complete ArgoCD deployment automation guide
- IaC with Pulumi and TypeScript — General-language IaC alternative
- Docker Container Security Hardening — Container 8-layer defense system
External References:
Try these browser-local tools — no sign-up required →