Zero-Downtime Deployments with NGINX Blue-Green on a Single VPS

Photo by Unsplash

Photo by Unsplash
Every time you restart a Docker container to deploy a new version, there is a gap — however brief — where your application is unavailable. For low-traffic side projects that gap is tolerable. For production services with real users and SLAs, it is not. I implemented blue-green deployments using plain NGINX and Docker on DigitalOcean Droplets after a client complained about brief outages during our nightly release cycle. The solution requires no Kubernetes, no external load balancer, and no paid tooling — just a bash script, two container slots, and a smart NGINX reload.
Blue-green deployment keeps two identical production environments — called Blue and Green — always available. At any moment, one environment is live (serving user traffic) and the other is idle (ready to receive the next deployment). When you deploy a new version, you bring it up in the idle slot, run health checks, and then flip the NGINX upstream to point at the new slot. The old slot remains running for instant rollback — if anything looks wrong after the flip, you change the upstream back and reload NGINX. The entire cutover takes under one second.
Docker's default restart behavior (`docker stop` + `docker start`) creates a downtime window. Even with a fast startup, there is typically 2-10 seconds where no container is listening on the port. NGINX returns 502 Bad Gateway during this window. Docker Compose's `depends_on` with health checks helps, but the graceful shutdown of the old container and the readiness of the new one are not atomically synchronized. Blue-green avoids this by never stopping the active container until the new one is confirmed healthy and serving traffic.
NGINX's `nginx -s reload` command reloads the configuration file and updates upstream targets without dropping any in-flight connections. When you change the upstream block to point at the new container port and then reload, NGINX finishes serving all active requests on the old upstream before switching new connections to the new one. This behavior — called graceful reload — is the key that makes zero-downtime blue-green possible with vanilla NGINX without any plugins or paid modules.
┌──────────────────────────────────────────────────────────────────┐
│ Blue-Green Deployment Flow with NGINX │
│ │
│ Internet │
│ │ │
│ ▼ │
│ ┌──────────────────────────────────────┐ │
│ │ NGINX Upstream (upstream.conf) │ │
│ │ │ │
│ │ Step 1: upstream → app-blue:3000 │ │
│ │ Step 2: deploy app-green:3001 │ │
│ │ Step 3: health check green ✓ │ │
│ │ Step 4: upstream → app-green:3001 │ │
│ │ Step 5: reload nginx (0 downtime) │ │
│ └──────────────────────────────────────┘ │
│ │ │ │
│ ▼ ▼ │
│ ┌─────────────┐ ┌─────────────────┐ │
│ │ app-blue │ │ app-green │ │
│ │ :3000 │ │ :3001 │ │
│ │ (v1.0) │ │ (v2.0 new) │ │
│ │ IDLE after │ │ LIVE after │ │
│ │ cutover │ │ cutover │ │
│ └─────────────┘ └─────────────────┘ │
└──────────────────────────────────────────────────────────────────┘Add `proxy_next_upstream error timeout http_502 http_503` to your NGINX location block. This tells NGINX to automatically retry a request on the next upstream server if the primary upstream returns an error or times out. Combined with keeping both blue and green containers running during the transition window, this ensures even in-flight requests at the moment of the upstream flip are handled transparently.
The core of the blue-green setup is a bash script that: detects which container is currently live (blue or green), starts the other container with the new image, polls the health check endpoint until it responds 200, updates the NGINX upstream file, reloads NGINX gracefully, and then stops the previously live container. The script is idempotent — you can run it multiple times safely. It exits with a non-zero code if the health check fails, leaving the old container running and NGINX unchanged, so no user ever sees a broken deployment.
The health check endpoint must reflect genuine application readiness, not just HTTP server startup. A good /health endpoint in a Node.js/Express app checks database connectivity, cache availability, and any required external service connections. If any dependency is unavailable, the health check returns HTTP 503, the deployment script sees the failure, and the blue-green swap is aborted. This means a database migration that fails or a misconfigured environment variable will prevent the bad deployment from ever receiving live traffic.
#!/bin/bash
# blue-green-deploy.sh — Zero-downtime deploy script for NGINX + Docker
set -euo pipefail
IMAGE="registry.example.com/myapp"
TAG="${1:-latest}"
NGINX_UPSTREAM="/etc/nginx/conf.d/upstream.conf"
# Detect active color
if docker ps --format '{{.Names}}' | grep -q "app-blue"; then
ACTIVE="blue"; ACTIVE_PORT=3000
IDLE="green"; IDLE_PORT=3001
else
ACTIVE="green"; ACTIVE_PORT=3001
IDLE="blue"; IDLE_PORT=3000
fi
echo "Active: app-$ACTIVE ($ACTIVE_PORT) → deploying to app-$IDLE ($IDLE_PORT)"
# 1. Pull new image
docker pull "$IMAGE:$TAG"
# 2. Start idle container
docker rm -f "app-$IDLE" 2>/dev/null || true
docker run -d --name "app-$IDLE" --network app-net -p "$IDLE_PORT:3000" -e NODE_ENV=production "$IMAGE:$TAG"
# 3. Health-check loop (30s timeout)
echo "Waiting for app-$IDLE to become healthy..."
for i in $(seq 1 30); do
if curl -sf "http://localhost:$IDLE_PORT/health" > /dev/null; then
echo "Healthy after ${i}s"; break
fi
[ "$i" -eq 30 ] && { echo "Health check failed"; exit 1; }
sleep 1
done
# 4. Flip NGINX upstream
cat > "$NGINX_UPSTREAM" <<EOF
upstream app_backend {
server 127.0.0.1:$IDLE_PORT;
}
EOF
nginx -t && nginx -s reload
echo "Traffic switched to app-$IDLE"
# 5. Stop old container
docker stop "app-$ACTIVE" && docker rm "app-$ACTIVE"
echo "Deployment complete. Active: app-$IDLE"
# /etc/nginx/sites-enabled/myapp.conf
# server {
# listen 80;
# server_name myapp.example.com;
# location / {
# proxy_pass http://app_backend;
# proxy_set_header Host $host;
# proxy_set_header X-Real-IP $remote_addr;
# }
# }The beauty of blue-green is that rollback is instantaneous and requires no new deployment. Since the old container is still running after the cutover (just not receiving new connections via NGINX), a rollback is simply pointing the NGINX upstream back at the old port and reloading. Add a `rollback.sh` script that mirrors the deploy script but skips the container startup — just flip the upstream and reload. Keep the old container running for at least 30 minutes after a successful deploy before removing it, giving yourself a clean rollback window.
If your application maintains WebSocket connections or long-polling, the graceful NGINX reload will not forcefully terminate those connections — they stay on the old upstream until the client disconnects. This is usually desirable, but it means the old container may continue handling traffic for minutes after the swap. Do not stop the old container immediately after the NGINX reload. Instead, wait at least 60 seconds (or monitor active connections with `ss -tnp | grep <port>`) before stopping the old container to avoid forcibly closing active WebSocket sessions.
The deployment script integrates cleanly with GitHub Actions using the `appleboy/ssh-action` marketplace action. Your CI workflow builds and pushes a Docker image on every push to main, then SSH into the production server and runs `./blue-green-deploy.sh <image-tag>`. The workflow fails if the health check loop times out, sending a failure notification to your Slack or email before any broken code reaches users. Store the SSH private key and server IP in GitHub Actions secrets — never hard-code them in the workflow file.
This blue-green setup runs on a single $12-24/month DigitalOcean Droplet with no additional infrastructure. Kubernetes with GKE or DOKS for the same zero-downtime guarantee costs at least $35-75/month for the control plane alone, plus node costs. For small-to-medium workloads where you control the full stack, the bash + NGINX + Docker approach is dramatically cheaper and easier to debug. Kubernetes becomes worth the operational complexity when you need horizontal Pod autoscaling, multi-tenant namespace isolation, or cluster-level resource guarantees — none of which a single-service VPS deployment requires.