When I first deployed a multi-server setup for a client at Commsult Indonesia, the load balancer configuration was naive: round-robin across three upstream servers with no health checks, no connection limits, and no timeout tuning. The first time one backend server ran out of memory and started returning 504s, Nginx dutifully continued sending 33% of traffic to it for several minutes before the monitoring alerted us. Production load balancers need active health check logic, graceful failover, and proper upstream configuration. This guide covers what I've learned maintaining Nginx as a load balancer for web APIs serving Jakarta-based clients.
The upstream block in Nginx defines the pool of backend servers and how traffic is distributed among them. The default algorithm is round-robin — each request goes to the next server in the list in rotation. For stateless APIs (REST, GraphQL), round-robin works well. For WebSocket connections or long-lived sessions, least_conn is better — it routes each new connection to the server with the fewest active connections, preventing connection accumulation on a single server. The ip_hash directive is available for sticky sessions but should be avoided in modern architectures where session state belongs in Redis, not in memory on a specific server.
Nginx upstream servers support weight parameters to shift more or less traffic to specific servers. If one backend has twice the CPU and RAM, set weight=2 to send it twice the traffic. The backup parameter marks a server as a failover — it only receives traffic when all primary servers are unavailable. This is useful for graceful degradation: a backup server running a simplified version of your application handles traffic when the main fleet is down, returning something useful rather than a 502 error. In our setup, we have one Droplet configured as backup that serves a maintenance page when the primary two servers are both unavailable.
Nginx's default timeouts are too long for most production APIs. proxy_connect_timeout controls how long Nginx waits for a connection to the backend — 60 seconds is the default, which is absurd for a local network connection. Set this to 5-10 seconds. proxy_read_timeout controls how long Nginx waits for the backend to send a response after the connection is established — 60 seconds default, which means Nginx holds a connection open for a minute on a hung backend. Tune this to match your actual request processing time plus a safety margin. proxy_send_timeout controls how long Nginx waits while sending a request to the backend.
From my experience: set proxy_next_upstream error timeout http_500 http_502 http_503 and proxy_next_upstream_tries 2 in your upstream location block. This tells Nginx to automatically retry failed requests on the next upstream server for server errors and timeouts. Combined with proper health checks, this provides automatic failover for transient backend errors without impacting the end user. I've had instances where a Node.js process crashed mid-request and the user never noticed because Nginx retried on a healthy backend within milliseconds.
Nginx open-source supports passive health checks — it marks a backend as unavailable based on observed failures. The max_fails parameter sets how many consecutive failures cause a server to be marked as unavailable. The fail_timeout parameter sets how long the server stays unavailable before Nginx tries it again. A production starting point: max_fails=3 fail_timeout=30s. This means three consecutive failures (timeouts or connection errors) mark the server unavailable for 30 seconds, after which Nginx tries one request to test if it's recovered. If that request succeeds, the server is restored to the pool; if it fails, the 30-second timeout resets.
# /etc/nginx/conf.d/upstream.conf
upstream api_backend {
least_conn;
server 10.0.1.10:3000 weight=2 max_fails=3 fail_timeout=30s;
server 10.0.1.11:3000 weight=2 max_fails=3 fail_timeout=30s;
server 10.0.1.12:3000 backup; # failover server
}
# Rate limiting zone
limit_req_zone $binary_remote_addr zone=api_limit:10m rate=10r/s;
server {
listen 443 ssl http2;
server_name api.example.com;
ssl_certificate /etc/letsencrypt/live/api.example.com/fullchain.pem;
ssl_certificate_key /etc/letsencrypt/live/api.example.com/privkey.pem;
location /api/ {
limit_req zone=api_limit burst=20 nodelay;
proxy_pass http://api_backend;
proxy_connect_timeout 5s;
proxy_read_timeout 30s;
proxy_send_timeout 10s;
proxy_next_upstream error timeout http_500 http_502 http_503;
proxy_next_upstream_tries 2;
proxy_set_header Host $host;
proxy_set_header X-Real-IP $remote_addr;
proxy_set_header X-Forwarded-For $proxy_add_x_forwarded_for;
proxy_set_header X-Forwarded-Proto $scheme;
}
}Terminate SSL at the Nginx load balancer and forward HTTP to backend servers on the internal network. This centralizes certificate management, reduces CPU load on backend servers (TLS handshakes are CPU-intensive), and simplifies backend configuration. Use Certbot with the Nginx plugin to obtain and auto-renew Let's Encrypt certificates on the load balancer. Pass the X-Forwarded-For, X-Forwarded-Proto, and X-Real-IP headers from the load balancer to backends so application code can read the real client IP and protocol. Always verify that your application uses these headers correctly — logging the wrong IP or trusting HTTP when HTTPS is required can cause real issues.
┌─────────────────────────────────────────────────────┐
│ Nginx Load Balancer Production Setup │
├─────────────────────────────────────────────────────┤
│ │
│ Internet → Cloudflare CDN │
│ ↓ │
│ Nginx Load Balancer (443 SSL) │
│ [rate limiting, SSL termination] │
│ ↓ ↓ │
│ Backend 1:3000 Backend 2:3000 │
│ (weight=2) (weight=2) │
│ ↑ │
│ Backup :3000 (if both fail) │
│ │
│ Health: max_fails=3 fail_timeout=30s │
└─────────────────────────────────────────────────────┘I once modified an Nginx upstream configuration on a production load balancer to change the load balancing algorithm from round-robin to least_conn. I edited nginx.conf and ran nginx -s reload — which I expected to apply gracefully. What I missed: I had accidentally deleted one of the upstream server entries, so the reload instantly dropped one-third of backend capacity. The nginx -s reload command applies changes to new connections but doesn't validate that upstream servers are reachable. Always run nginx -t first (configuration test), always test changes on staging, and consider using a canary approach where you apply the change to one PoP or one load balancer in a cluster before all of them.
A load balancer without rate limiting is vulnerable to traffic spikes that overwhelm backends. Nginx's limit_req_zone and limit_req directives implement token-bucket rate limiting per IP. A common configuration: 10 requests per second per IP for an API with a burst allowance of 20. Clients within the burst limit are served immediately; clients exceeding the rate limit receive 429 Too Many Requests. For authenticated APIs where rate limiting per user is more appropriate than per IP, use a header-based zone key: $http_x_user_id. Combine with fail2ban on the backend to block IPs that repeatedly trigger rate limits.
Nginx supports zero-downtime configuration reloads via nginx -s reload. The master process reads the new configuration, forks new worker processes with the updated config, and gracefully drains existing connections on the old workers. Active connections complete normally; new connections go to the new workers. This means you can update upstream server lists, change timeout values, or modify SSL certificates without dropping a single connection. The critical prerequisite: always run nginx -t before nginx -s reload to validate configuration syntax. A bad config file causes the reload to fail with the old config remaining active — which is actually safe behavior.
For the scale I run at Commsult Indonesia (thousands of requests per minute, not millions), Nginx is the right choice: familiar, well-documented, dual-purpose as both a load balancer and a web server, and free. HAProxy has more powerful health check capabilities (including active health checks in open source) and is purpose-built for load balancing with a richer feature set — worth considering for higher traffic volumes or when you need UDP load balancing. GCP's Cloud Load Balancing and Cloudflare's load balancing are excellent for global traffic distribution with automatic failover across regions, but they add cost and operational complexity that's not justified for Indonesian SME client workloads running in the Singapore region.
Sources & Further Reading