Ansible manages configuration on every server I run at Commsult Indonesia — eight DigitalOcean Droplets and three GCP Compute Engine instances. Without Ansible, every server is a snowflake: slightly different Nginx configs, different versions of Node.js, different firewall rules applied by hand in different orders. Ansible makes servers cattle, not pets — any server can be reprovisioned from scratch in under ten minutes and will be identical to every other server in its role group. This guide covers the patterns that actually work in production, not the tutorial examples that break under real conditions.
Ansible roles are the fundamental unit of modular automation. A role manages a single responsibility: installing Nginx, configuring PostgreSQL, setting up the Node.js application, or configuring the firewall. Do not write a single monolithic playbook that does everything — you will never be able to test or reuse it. The standard role directory structure includes tasks/main.yml (the task list), handlers/main.yml (restart/reload actions triggered by task changes), templates/ (Jinja2 config file templates), vars/ (role-specific variables), and defaults/ (overridable defaults). The separation between vars/ and defaults/ is important: vars/ cannot be overridden by the caller, defaults/ can.
Every Ansible task must be idempotent — running the playbook twice on the same host should produce the same result as running it once. This means using the correct module for each action: use apt or yum instead of shell: apt-get install; use copy or template instead of shell: echo; use systemd instead of shell: service restart. When you must use shell or command modules (sometimes unavoidable for complex configurations), use creates: or removes: parameters to make the task skip if its effect is already present. Idempotency lets you re-run playbooks safely to enforce configuration drift back to baseline.
A flat inventory file does not scale beyond a handful of servers. Use inventory directories with separate files per environment or per role group. Dynamic inventories are even better for cloud infrastructure: the gcp_compute plugin auto-discovers Compute Engine instances by labels, and the digitalocean plugin discovers Droplets by tags. With dynamic inventory, adding a new server to a tag group automatically includes it in the next playbook run — no manual inventory editing required. On GCP, I use instance labels like role=web-server and env=production to automatically group instances for playbook targeting.
From my experience: always run playbooks with --check --diff before applying them to production. The check mode shows every change that would be made without making it; --diff shows the exact file content differences. I add a CI step that runs ansible-playbook --check --diff --limit staging against our staging inventory on every pull request. If staging check passes, I manually apply to prod. This caught three configuration errors that would have broken production Nginx in the last year.
Never store plaintext secrets in your Ansible repository. Database passwords, API keys, SSL certificate contents, and any credential that grants access to a system must be encrypted. Ansible Vault encrypts individual variables or entire files using AES-256. The vault password can be stored in a file (excluded from Git via .gitignore), passed via --vault-password-file, or retrieved from a secrets manager like HashiCorp Vault or GCP Secret Manager via a vault password script. In production, I use a Python script as the vault password provider that fetches the decryption key from GCP Secret Manager at playbook runtime — no vault password file exists on disk anywhere.
# Encrypt a secret with Ansible Vault
ansible-vault encrypt_string 'db_password_here' --name 'db_password'
# Output stored in group_vars/prod/vault.yml:
# db_password: !vault |
# $ANSIBLE_VAULT;1.1;AES256
# ...
# Run playbook with vault password from GCP Secret Manager
ansible-playbook site.yml --vault-password-file=scripts/vault-password-from-gcp.py --inventory=inventories/production --limit=web-servers --check --diff
# Production deployment with rolling update
ansible-playbook deploy.yml --inventory=inventories/production --serial 1 --tags=deployHandlers in Ansible run at the end of a play, not immediately when notified. This prevents the common mistake of restarting Nginx multiple times during a single run when three different tasks all modify Nginx configuration. All three tasks notify the same 'reload nginx' handler, which runs once after all tasks complete. Use reload instead of restart for Nginx and similar services — reload reads the new configuration without dropping existing connections. Use restart only for services that do not support graceful reload (some daemons require a full restart to pick up config changes).
┌─────────────────────────────────────────────────┐
│ Ansible Role Directory Structure │
├─────────────────────────────────────────────────┤
│ roles/nginx/ │
│ ├── tasks/ │
│ │ └── main.yml (task list) │
│ ├── handlers/ │
│ │ └── main.yml (reload/restart actions) │
│ ├── templates/ │
│ │ └── nginx.conf.j2 (Jinja2 config) │
│ ├── vars/ │
│ │ └── main.yml (non-overridable vars) │
│ └── defaults/ │
│ └── main.yml (overridable defaults) │
└─────────────────────────────────────────────────┘Molecule is the standard testing framework for Ansible roles. It provisions a container or VM, runs your role, and verifies the outcome using Testinfra (Python-based assertions on the system state). A Molecule test for an Nginx role checks that the nginx service is running, that port 80 and 443 are listening, that the config file contains expected directives, and that the Nginx config passes nginx -t validation. Running Molecule tests in CI (on every role change) catches regressions before they reach production. Molecule runs in Docker by default, making it fast and free for CI pipelines.
I ran our early Ansible playbooks as root via become: yes at the play level, which means every task ran with full system privileges. This is dangerous — a bug in a template task could overwrite /etc/passwd with garbage. The correct approach: run as a limited service account (ansible-runner) with sudo access restricted to specific commands via sudoers. Use become: yes only at the task level when elevation is genuinely needed. Privilege escalation should be the exception, not the default. Additionally, lock down the Ansible control node — only it should have SSH access to managed nodes on port 22.
For zero-downtime application deployments using Ansible, use the serial keyword to apply changes to one server at a time in a rolling fashion. With serial: 1, Ansible deploys to the first web server, waits for health checks to pass, then moves to the second. If health checks fail, the play stops and the remaining servers are untouched — limiting blast radius. Combine with delegate_to: localhost tasks that update a load balancer backend pool to temporarily drain the target server before deployment and re-add it after.
I use both at Commsult Indonesia, but for different things. Terraform provisions infrastructure — creates servers, networks, databases. Ansible configures what's already provisioned — installs packages, writes config files, manages services. The overlap is minimal: Ansible can provision cloud resources (via modules for GCP and DigitalOcean), and Terraform can run provisioners. But using each tool for its primary purpose gives you better modularity and cleaner separation of concerns. One important note for Indonesian teams: Ansible's agentless architecture is a major advantage over alternatives like Puppet or Chef — there's nothing to install on managed nodes beyond Python and SSH, which are present on every Linux server by default.
Sources & Further Reading