Why is a simple Docker container restart not enough for zero-downtime deployments?

When you run `docker stop` followed by `docker start`, there is typically a 2–10 second gap where no container is listening on the port. During this window NGINX returns 502 Bad Gateway to any incoming request. Blue-green deployment avoids this by keeping the active container running until the new one is confirmed healthy and serving traffic.

How does NGINX achieve a zero-downtime upstream switch without dropping in-flight requests?

NGINX's `nginx -s reload` command reloads the configuration and updates upstream targets without dropping active connections — a behaviour called graceful reload. When the upstream block is updated to the new container port and NGINX is reloaded, it finishes serving all in-progress requests on the old upstream before routing new connections to the new one. Adding `proxy_next_upstream error timeout http_502 http_503` further ensures that requests hitting a transitioning upstream are transparently retried on the healthy slot.

What does the blue-green deployment script actually do, and what happens if the health check fails?

The bash script detects which slot (blue or green) is currently live, starts the other slot with the new image, polls the `/health` endpoint until it returns HTTP 200, updates the NGINX upstream file, gracefully reloads NGINX, and then stops the previously live container. If the health check never returns 200, the script exits with a non-zero code, leaves the old container running, and leaves NGINX unchanged — so no broken deployment ever reaches users.

How does the cost of this blue-green setup compare to using Kubernetes for the same guarantee?

The bash + NGINX + Docker blue-green setup runs on a single $12–24/month DigitalOcean Droplet with no additional infrastructure. Kubernetes on GKE or DOKS costs at least $35–75/month for the control plane alone, before node costs. For small-to-medium single-service workloads, the VPS approach is significantly cheaper and easier to debug, with Kubernetes becoming worth its operational complexity only when you need features like horizontal Pod autoscaling, multi-tenant namespace isolation, or cluster-level resource guarantees.

Zero-Downtime Deployments with NGINX Blue-Green on a Single VPS

Q: How should WebSocket or long-poll connections be handled during a blue-green swap?

NGINX's graceful reload does not forcibly close existing WebSocket or long-poll connections — they remain on the old upstream until the client disconnects. This means the old container may continue handling traffic for minutes after the upstream flip. The post recommends waiting at least 60 seconds (or monitoring active connections with `ss -tnp`) before stopping the old container to avoid forcibly terminating live sessions.

Every time you restart a Docker container to deploy a new version, there is a gap — however brief — where your application is unavailable. For low-traffic side projects that gap is tolerable. For production services with real users and SLAs, it is not. I implemented blue-green deployments using plain NGINX and Docker on DigitalOcean Droplets after a client complained about brief outages during our nightly release cycle. The solution requires no Kubernetes, no external load balancer, and no paid tooling — just a bash script, two container slots, and a smart NGINX reload.

The Blue-Green Concept

Blue-green deployment keeps two identical production environments — called Blue and Green — always available. At any moment, one environment is live (serving user traffic) and the other is idle (ready to receive the next deployment). When you deploy a new version, you bring it up in the idle slot, run health checks, and then flip the NGINX upstream to point at the new slot. The old slot remains running for instant rollback — if anything looks wrong after the flip, you change the upstream back and reload NGINX. The entire cutover takes under one second.

Why Not Just Use Rolling Restarts?

Docker's default restart behavior (`docker stop` + `docker start`) creates a downtime window. Even with a fast startup, there is typically 2-10 seconds where no container is listening on the port. NGINX returns 502 Bad Gateway during this window. Docker Compose's `depends_on` with health checks helps, but the graceful shutdown of the old container and the readiness of the new one are not atomically synchronized. Blue-green avoids this by never stopping the active container until the new one is confirmed healthy and serving traffic.

NGINX as the Traffic Switcher

NGINX's `nginx -s reload` command reloads the configuration file and updates upstream targets without dropping any in-flight connections. When you change the upstream block to point at the new container port and then reload, NGINX finishes serving all active requests on the old upstream before switching new connections to the new one. This behavior — called graceful reload — is the key that makes zero-downtime blue-green possible with vanilla NGINX without any plugins or paid modules.

┌──────────────────────────────────────────────────────────────────┐
│              Blue-Green Deployment Flow with NGINX               │
│                                                                  │
│  Internet                                                        │
│     │                                                            │
│     ▼                                                            │
│  ┌──────────────────────────────────────┐                        │
│  │  NGINX Upstream (upstream.conf)      │                        │
│  │                                      │                        │
│  │  Step 1: upstream → app-blue:3000   │                        │
│  │  Step 2: deploy app-green:3001       │                        │
│  │  Step 3: health check green ✓        │                        │
│  │  Step 4: upstream → app-green:3001  │                        │
│  │  Step 5: reload nginx (0 downtime)   │                        │
│  └──────────────────────────────────────┘                        │
│          │                    │                                  │
│          ▼                    ▼                                  │
│   ┌─────────────┐    ┌─────────────────┐                        │
│   │  app-blue   │    │   app-green     │                        │
│   │  :3000      │    │   :3001         │                        │
│   │  (v1.0)     │    │   (v2.0 new)    │                        │
│   │  IDLE after │    │   LIVE after    │                        │
│   │  cutover    │    │   cutover       │                        │
│   └─────────────┘    └─────────────────┘                        │
└──────────────────────────────────────────────────────────────────┘

Add `proxy_next_upstream error timeout http_502 http_503` to your NGINX location block. This tells NGINX to automatically retry a request on the next upstream server if the primary upstream returns an error or times out. Combined with keeping both blue and green containers running during the transition window, this ensures even in-flight requests at the moment of the upstream flip are handled transparently.

The Deployment Script

The core of the blue-green setup is a bash script that: detects which container is currently live (blue or green), starts the other container with the new image, polls the health check endpoint until it responds 200, updates the NGINX upstream file, reloads NGINX gracefully, and then stops the previously live container. The script is idempotent — you can run it multiple times safely. It exits with a non-zero code if the health check fails, leaving the old container running and NGINX unchanged, so no user ever sees a broken deployment.

Health Check Design

The health check endpoint must reflect genuine application readiness, not just HTTP server startup. A good /health endpoint in a Node.js/Express app checks database connectivity, cache availability, and any required external service connections. If any dependency is unavailable, the health check returns HTTP 503, the deployment script sees the failure, and the blue-green swap is aborted. This means a database migration that fails or a misconfigured environment variable will prevent the bad deployment from ever receiving live traffic.

#!/bin/bash
# blue-green-deploy.sh — Zero-downtime deploy script for NGINX + Docker

set -euo pipefail

IMAGE="registry.example.com/myapp"
TAG="${1:-latest}"
NGINX_UPSTREAM="/etc/nginx/conf.d/upstream.conf"

# Detect active color
if docker ps --format '{{.Names}}' | grep -q "app-blue"; then
  ACTIVE="blue"; ACTIVE_PORT=3000
  IDLE="green";  IDLE_PORT=3001
else
  ACTIVE="green"; ACTIVE_PORT=3001
  IDLE="blue";    IDLE_PORT=3000
fi

echo "Active: app-$ACTIVE ($ACTIVE_PORT) → deploying to app-$IDLE ($IDLE_PORT)"

# 1. Pull new image
docker pull "$IMAGE:$TAG"

# 2. Start idle container
docker rm -f "app-$IDLE" 2>/dev/null || true
docker run -d   --name "app-$IDLE"   --network app-net   -p "$IDLE_PORT:3000"   -e NODE_ENV=production   "$IMAGE:$TAG"

# 3. Health-check loop (30s timeout)
echo "Waiting for app-$IDLE to become healthy..."
for i in $(seq 1 30); do
  if curl -sf "http://localhost:$IDLE_PORT/health" > /dev/null; then
    echo "Healthy after ${i}s"; break
  fi
  [ "$i" -eq 30 ] && { echo "Health check failed"; exit 1; }
  sleep 1
done

# 4. Flip NGINX upstream
cat > "$NGINX_UPSTREAM" <<EOF
upstream app_backend {
  server 127.0.0.1:$IDLE_PORT;
}
EOF

nginx -t && nginx -s reload
echo "Traffic switched to app-$IDLE"

# 5. Stop old container
docker stop "app-$ACTIVE" && docker rm "app-$ACTIVE"
echo "Deployment complete. Active: app-$IDLE"

# /etc/nginx/sites-enabled/myapp.conf
# server {
#   listen 80;
#   server_name myapp.example.com;
#   location / {
#     proxy_pass http://app_backend;
#     proxy_set_header Host $host;
#     proxy_set_header X-Real-IP $remote_addr;
#   }
# }

Rollback Strategy

The beauty of blue-green is that rollback is instantaneous and requires no new deployment. Since the old container is still running after the cutover (just not receiving new connections via NGINX), a rollback is simply pointing the NGINX upstream back at the old port and reloading. Add a `rollback.sh` script that mirrors the deploy script but skips the container startup — just flip the upstream and reload. Keep the old container running for at least 30 minutes after a successful deploy before removing it, giving yourself a clean rollback window.

If your application maintains WebSocket connections or long-polling, the graceful NGINX reload will not forcefully terminate those connections — they stay on the old upstream until the client disconnects. This is usually desirable, but it means the old container may continue handling traffic for minutes after the swap. Do not stop the old container immediately after the NGINX reload. Instead, wait at least 60 seconds (or monitor active connections with `ss -tnp | grep <port>`) before stopping the old container to avoid forcibly closing active WebSocket sessions.

Integrating with GitHub Actions

The deployment script integrates cleanly with GitHub Actions using the `appleboy/ssh-action` marketplace action. Your CI workflow builds and pushes a Docker image on every push to main, then SSH into the production server and runs `./blue-green-deploy.sh <image-tag>`. The workflow fails if the health check loop times out, sending a failure notification to your Slack or email before any broken code reaches users. Store the SSH private key and server IP in GitHub Actions secrets — never hard-code them in the workflow file.

Cost and Complexity Compared to Kubernetes

This blue-green setup runs on a single $12-24/month DigitalOcean Droplet with no additional infrastructure. Kubernetes with GKE or DOKS for the same zero-downtime guarantee costs at least $35-75/month for the control plane alone, plus node costs. For small-to-medium workloads where you control the full stack, the bash + NGINX + Docker approach is dramatically cheaper and easier to debug. Kubernetes becomes worth the operational complexity when you need horizontal Pod autoscaling, multi-tenant namespace isolation, or cluster-level resource guarantees — none of which a single-service VPS deployment requires.

Sources & Further Reading

Frequently Asked Questions

Zero-Downtime Deployments with NGINX Blue-Green on a Single VPS

Frequently Asked Questions

Zero-Downtime Deployments with NGINX Blue-Green on a Single VPS

The Blue-Green Concept

Why Not Just Use Rolling Restarts?

NGINX as the Traffic Switcher

The Deployment Script

Health Check Design

Rollback Strategy

Integrating with GitHub Actions

Cost and Complexity Compared to Kubernetes

Related Articles

The Blue-Green Concept

Why Not Just Use Rolling Restarts?

NGINX as the Traffic Switcher

The Deployment Script

Health Check Design

Rollback Strategy

Integrating with GitHub Actions

Cost and Complexity Compared to Kubernetes

Related Articles