Why use least_conn instead of round-robin for WebSocket or long-lived connections?

Round-robin distributes requests evenly in rotation, which works well for short-lived, stateless API calls. For WebSocket connections or long-lived sessions, least_conn is better because it routes each new connection to the server with the fewest active connections, preventing one backend from accumulating a disproportionate load while others sit idle.

How does Nginx's passive health check work, and what are the recommended settings?

Nginx's passive health check marks a backend as unavailable based on observed failures rather than active probing. The max_fails parameter sets how many consecutive failures trigger the unavailability, and fail_timeout sets how long the server stays out of the pool before being retried. A practical production starting point is max_fails=3 fail_timeout=30s — three consecutive failures pull the server for 30 seconds, after which one test request determines whether it rejoins the pool.

What is the safest way to apply Nginx load balancer changes without downtime?

Always run nginx -t first to validate configuration syntax before applying any change — a bad config file causes the reload to fail safely while keeping the old configuration active. Then use nginx -s reload, which forks new worker processes with the updated config while active connections finish normally on the old workers. Test changes on staging first, and consider a canary rollout applying the change to one load balancer in a cluster before all of them.

Why terminate SSL at the load balancer rather than on each backend server?

Terminating SSL at the Nginx load balancer centralises certificate management to a single point, reduces CPU load on backend servers (TLS handshakes are CPU-intensive), and simplifies backend configuration since those servers only handle plain HTTP on the internal network. Certbot with the Nginx plugin can obtain and auto-renew Let's Encrypt certificates directly on the load balancer.

When should you choose HAProxy or a cloud load balancer over Nginx?

Nginx is the right choice at moderate traffic volumes (thousands of requests per minute) because it is familiar, dual-purpose as both a load balancer and web server, and free. HAProxy offers more powerful health check capabilities, including active health checks in its open-source edition, and is worth considering at higher traffic volumes or when UDP load balancing is needed. Cloud options like GCP Cloud Load Balancing or Cloudflare are excellent for global traffic distribution with cross-region failover, but add cost and operational complexity that is difficult to justify for regional workloads such as Indonesian SME clients running in the Singapore region.

Nginx Load Balancer in Production: Configuration, Health Checks, and Failover

When I first deployed a multi-server setup for a client at Commsult Indonesia, the load balancer configuration was naive: round-robin across three upstream servers with no health checks, no connection limits, and no timeout tuning. The first time one backend server ran out of memory and started returning 504s, Nginx dutifully continued sending 33% of traffic to it for several minutes before the monitoring alerted us. Production load balancers need active health check logic, graceful failover, and proper upstream configuration. This guide covers what I've learned maintaining Nginx as a load balancer for web APIs serving Jakarta-based clients.

Upstream Group Configuration Fundamentals

The upstream block in Nginx defines the pool of backend servers and how traffic is distributed among them. The default algorithm is round-robin — each request goes to the next server in the list in rotation. For stateless APIs (REST, GraphQL), round-robin works well. For WebSocket connections or long-lived sessions, least_conn is better — it routes each new connection to the server with the fewest active connections, preventing connection accumulation on a single server. The ip_hash directive is available for sticky sessions but should be avoided in modern architectures where session state belongs in Redis, not in memory on a specific server.

Server Weights and Backup Servers

Nginx upstream servers support weight parameters to shift more or less traffic to specific servers. If one backend has twice the CPU and RAM, set weight=2 to send it twice the traffic. The backup parameter marks a server as a failover — it only receives traffic when all primary servers are unavailable. This is useful for graceful degradation: a backup server running a simplified version of your application handles traffic when the main fleet is down, returning something useful rather than a 502 error. In our setup, we have one Droplet configured as backup that serves a maintenance page when the primary two servers are both unavailable.

Timeout Configuration

Nginx's default timeouts are too long for most production APIs. proxy_connect_timeout controls how long Nginx waits for a connection to the backend — 60 seconds is the default, which is absurd for a local network connection. Set this to 5-10 seconds. proxy_read_timeout controls how long Nginx waits for the backend to send a response after the connection is established — 60 seconds default, which means Nginx holds a connection open for a minute on a hung backend. Tune this to match your actual request processing time plus a safety margin. proxy_send_timeout controls how long Nginx waits while sending a request to the backend.

From my experience: set proxy_next_upstream error timeout http_500 http_502 http_503 and proxy_next_upstream_tries 2 in your upstream location block. This tells Nginx to automatically retry failed requests on the next upstream server for server errors and timeouts. Combined with proper health checks, this provides automatic failover for transient backend errors without impacting the end user. I've had instances where a Node.js process crashed mid-request and the user never noticed because Nginx retried on a healthy backend within milliseconds.

Passive Health Checks with max_fails and fail_timeout

Nginx open-source supports passive health checks — it marks a backend as unavailable based on observed failures. The max_fails parameter sets how many consecutive failures cause a server to be marked as unavailable. The fail_timeout parameter sets how long the server stays unavailable before Nginx tries it again. A production starting point: max_fails=3 fail_timeout=30s. This means three consecutive failures (timeouts or connection errors) mark the server unavailable for 30 seconds, after which Nginx tries one request to test if it's recovered. If that request succeeds, the server is restored to the pool; if it fails, the 30-second timeout resets.

# /etc/nginx/conf.d/upstream.conf

upstream api_backend {
    least_conn;

    server 10.0.1.10:3000 weight=2 max_fails=3 fail_timeout=30s;
    server 10.0.1.11:3000 weight=2 max_fails=3 fail_timeout=30s;
    server 10.0.1.12:3000 backup;  # failover server
}

# Rate limiting zone
limit_req_zone $binary_remote_addr zone=api_limit:10m rate=10r/s;

server {
    listen 443 ssl http2;
    server_name api.example.com;

    ssl_certificate     /etc/letsencrypt/live/api.example.com/fullchain.pem;
    ssl_certificate_key /etc/letsencrypt/live/api.example.com/privkey.pem;

    location /api/ {
        limit_req zone=api_limit burst=20 nodelay;

        proxy_pass         http://api_backend;
        proxy_connect_timeout 5s;
        proxy_read_timeout    30s;
        proxy_send_timeout    10s;

        proxy_next_upstream error timeout http_500 http_502 http_503;
        proxy_next_upstream_tries 2;

        proxy_set_header Host              $host;
        proxy_set_header X-Real-IP         $remote_addr;
        proxy_set_header X-Forwarded-For   $proxy_add_x_forwarded_for;
        proxy_set_header X-Forwarded-Proto $scheme;
    }
}

SSL Termination at the Load Balancer

Terminate SSL at the Nginx load balancer and forward HTTP to backend servers on the internal network. This centralizes certificate management, reduces CPU load on backend servers (TLS handshakes are CPU-intensive), and simplifies backend configuration. Use Certbot with the Nginx plugin to obtain and auto-renew Let's Encrypt certificates on the load balancer. Pass the X-Forwarded-For, X-Forwarded-Proto, and X-Real-IP headers from the load balancer to backends so application code can read the real client IP and protocol. Always verify that your application uses these headers correctly — logging the wrong IP or trusting HTTP when HTTPS is required can cause real issues.

┌─────────────────────────────────────────────────────┐
│          Nginx Load Balancer Production Setup        │
├─────────────────────────────────────────────────────┤
│                                                     │
│  Internet → Cloudflare CDN                          │
│                  ↓                                  │
│         Nginx Load Balancer (443 SSL)               │
│         [rate limiting, SSL termination]            │
│              ↓           ↓                          │
│     Backend 1:3000  Backend 2:3000                  │
│     (weight=2)      (weight=2)                      │
│                         ↑                           │
│              Backup :3000 (if both fail)            │
│                                                     │
│  Health: max_fails=3 fail_timeout=30s               │
└─────────────────────────────────────────────────────┘

I once modified an Nginx upstream configuration on a production load balancer to change the load balancing algorithm from round-robin to least_conn. I edited nginx.conf and ran nginx -s reload — which I expected to apply gracefully. What I missed: I had accidentally deleted one of the upstream server entries, so the reload instantly dropped one-third of backend capacity. The nginx -s reload command applies changes to new connections but doesn't validate that upstream servers are reachable. Always run nginx -t first (configuration test), always test changes on staging, and consider using a canary approach where you apply the change to one PoP or one load balancer in a cluster before all of them.

Rate Limiting and Connection Limiting

A load balancer without rate limiting is vulnerable to traffic spikes that overwhelm backends. Nginx's limit_req_zone and limit_req directives implement token-bucket rate limiting per IP. A common configuration: 10 requests per second per IP for an API with a burst allowance of 20. Clients within the burst limit are served immediately; clients exceeding the rate limit receive 429 Too Many Requests. For authenticated APIs where rate limiting per user is more appropriate than per IP, use a header-based zone key: $http_x_user_id. Combine with fail2ban on the backend to block IPs that repeatedly trigger rate limits.

Zero-Downtime Reloads and Configuration Testing

Nginx supports zero-downtime configuration reloads via nginx -s reload. The master process reads the new configuration, forks new worker processes with the updated config, and gracefully drains existing connections on the old workers. Active connections complete normally; new connections go to the new workers. This means you can update upstream server lists, change timeout values, or modify SSL certificates without dropping a single connection. The critical prerequisite: always run nginx -t before nginx -s reload to validate configuration syntax. A bad config file causes the reload to fail with the old config remaining active — which is actually safe behavior.

My Take: Nginx vs HAProxy vs Cloud Load Balancers

For the scale I run at Commsult Indonesia (thousands of requests per minute, not millions), Nginx is the right choice: familiar, well-documented, dual-purpose as both a load balancer and a web server, and free. HAProxy has more powerful health check capabilities (including active health checks in open source) and is purpose-built for load balancing with a richer feature set — worth considering for higher traffic volumes or when you need UDP load balancing. GCP's Cloud Load Balancing and Cloudflare's load balancing are excellent for global traffic distribution with automatic failover across regions, but they add cost and operational complexity that's not justified for Indonesian SME client workloads running in the Singapore region.

Sources & Further Reading

Frequently Asked Questions

Nginx Load Balancer in Production: Configuration, Health Checks, and Failover

Frequently Asked Questions

Nginx Load Balancer in Production: Configuration, Health Checks, and Failover

Upstream Group Configuration Fundamentals

Server Weights and Backup Servers

Timeout Configuration

Passive Health Checks with max_fails and fail_timeout

SSL Termination at the Load Balancer

Rate Limiting and Connection Limiting

Zero-Downtime Reloads and Configuration Testing

My Take: Nginx vs HAProxy vs Cloud Load Balancers

Related Articles

Upstream Group Configuration Fundamentals

Server Weights and Backup Servers

Timeout Configuration

Passive Health Checks with max_fails and fail_timeout

SSL Termination at the Load Balancer

Rate Limiting and Connection Limiting

Zero-Downtime Reloads and Configuration Testing

My Take: Nginx vs HAProxy vs Cloud Load Balancers

Related Articles