Terraform without modules is copy-paste infrastructure. You define the same VPC, the same compute instances, and the same database configurations in every environment — and then spend hours reconciling drift when you change something in one environment but forget another. I learned this the hard way at Commsult Indonesia, where we had three near-identical GCP environments (dev, staging, prod) with manually synchronized configurations. After the third incident where staging and prod drifted apart, I rewrote everything as composable Terraform modules. This guide shows the exact patterns I use.
A Terraform module is a container for multiple resources used together. The key word is 'together' — a module should represent a coherent infrastructure component, not a single resource type or an entire application stack. A module for a VPC makes sense. A module for a 'database with monitoring and alerting and backups configured' makes sense. A module that's just a thin wrapper around google_compute_instance does not — you've added abstraction without adding value. The standard module structure from HashiCorp is clear: main.tf for resource definitions, variables.tf for inputs, outputs.tf for values the caller needs, and a README.md that makes the module externally consumable.
The Don't Repeat Yourself principle from software engineering applies equally to infrastructure code. When you define a GCP Cloud SQL instance in main.tf of your dev environment and separately in staging and prod, you have three sources of truth — and they will diverge. A module collapses those three definitions to one, with environment-specific values passed as variables. The module enforces constraints — the prod environment can only use db-custom-4-8192 or larger, enforced by validation blocks in variables.tf. No individual can accidentally provision an undersized database in production.
The most common mistake with Terraform modules is scope creep — making modules too broad. A 'production environment' module that provisions VPCs, databases, compute instances, load balancers, and DNS records in one block is hard to test, hard to version, and hard to reuse. The right scope: a module should manage resources that must always be provisioned together and have strong lifecycle coupling. A GCP VPC and its subnets are tightly coupled — provision them together. A GCP Cloud SQL instance and its IAM bindings are coupled — provision them together. A GCP Load Balancer and the backend services it routes to — probably separate modules composed at the environment level.
From my experience: always output every resource ID and name from your modules, even if you don't use them immediately. In practice, module outputs become the glue between modules — you pass the VPC ID from your networking module into your compute module, and the database connection string from your database module into your application module. If you don't output it initially, you have to modify the module later and update all callers. Over-output from day one.
A well-structured Terraform module repository separates root modules (environment definitions that call child modules) from child modules (reusable building blocks). The convention I use at Commsult Indonesia places child modules under modules/ and environment root configs under environments/dev, environments/staging, environments/prod. Each child module has its own README.md, variables.tf with descriptions and validation blocks, outputs.tf, and main.tf. Child modules never contain provider configurations — they inherit from the root. This keeps modules environment-agnostic and truly reusable.
# modules/cloud-sql/variables.tf
variable "instance_name" {
description = "Cloud SQL instance name"
type = string
}
variable "machine_type" {
description = "Cloud SQL machine type"
type = string
default = "db-custom-2-8192"
validation {
condition = contains(["db-custom-2-8192", "db-custom-4-16384", "db-custom-8-32768"], var.machine_type)
error_message = "Machine type must be one of the approved sizes."
}
}
variable "environment" {
description = "Deployment environment"
type = string
validation {
condition = contains(["dev", "staging", "prod"], var.environment)
error_message = "Environment must be dev, staging, or prod."
}
}
# modules/cloud-sql/outputs.tf
output "instance_connection_name" {
description = "Cloud SQL instance connection name"
value = google_sql_database_instance.this.connection_name
}
output "private_ip_address" {
description = "Cloud SQL private IP address"
value = google_sql_database_instance.this.private_ip_address
}
# environments/prod/main.tf
module "database" {
source = "../../modules/cloud-sql"
instance_name = "prod-db-01"
machine_type = "db-custom-4-16384"
environment = "prod"
}Terraform's validation blocks let you enforce constraints at plan time rather than apply time. Use them aggressively in your modules to catch misconfiguration before any resource is touched. Type constraints in variables.tf define what callers can pass — object types with required keys prevent callers from omitting critical configuration. Combine with default values only for truly optional settings — required security settings (encryption, backup retention) should have no defaults to force explicit decisions per environment.
Early in my Terraform adoption I referenced internal modules directly from the Git main branch: source = 'git::https://github.com/myorg/tf-modules.git//modules/vpc'. This seemed convenient until a team member merged a breaking change to main that immediately broke our staging apply. The fix: tag module releases (v1.0.0, v1.1.0) and reference specific versions: source = 'git::...?ref=v1.1.0'. Breaking changes get a major version bump. All environments pin to specific versions and update deliberately, not automatically.
Untested Terraform modules accumulate bugs silently. The minimum testing approach is running terraform validate and terraform plan against a real (but ephemeral) environment for every pull request. For more comprehensive testing, Terratest (a Go-based testing library) lets you write tests that provision real infrastructure, assert on state and outputs, and destroy everything after the test. For modules that manage expensive resources, use localstack or the GCP emulator for unit-level tests, and reserve Terratest for integration tests that run on merge to main.
┌─────────────────────────────────────────────────┐
│ Terraform Module Repository Layout │
├─────────────────────────────────────────────────┤
│ tf-modules/ │
│ ├── modules/ (child modules) │
│ │ ├── vpc/ │
│ │ │ ├── main.tf │
│ │ │ ├── variables.tf │
│ │ │ └── outputs.tf │
│ │ ├── cloud-sql/ │
│ │ └── cloud-run/ │
│ └── environments/ (root modules) │
│ ├── dev/ │
│ ├── staging/ │
│ └── prod/ │
└─────────────────────────────────────────────────┘Modules without proper state management cause race conditions and corruption. Every environment root module should store state remotely with locking enabled. On GCP, use a GCS bucket with versioning enabled as the backend, and Terraform automatically handles state locking via the GCS object lock mechanism. Each environment gets its own state file — dev, staging, and prod state are completely isolated. Cross-environment state references use terraform_remote_state data sources (use sparingly — this creates tight coupling between environments).
The Terraform Registry has hundreds of community modules for AWS, GCP, and Azure. I use them for common patterns where I do not have specific requirements — the Google Network module and the Google Cloud SQL module are well-maintained and save significant development time. I write custom internal modules only when community modules do not match our specific requirements (usually around IAM policies, naming conventions, or multi-region configurations specific to Indonesia/Singapore regions). The rule: use community modules as starting points, fork and customize only when necessary, and contribute improvements upstream when you can.
Sources & Further Reading