Rate limiting is one of those features that feels optional until the day someone discovers your endpoint and sends 10,000 requests in 60 seconds, taking down your database. NestJS's @nestjs/throttler module makes implementing rate limiting straightforward, but the default in-memory storage breaks when you scale to multiple instances. Redis-backed rate limiting is the production-correct approach. I've implemented this on NestJS APIs serving ERP and SaaS applications where multiple server instances run behind a load balancer.
The default @nestjs/throttler uses in-memory storage — a Map that tracks request counts per client IP per TTL window. This works perfectly for a single-instance deployment. The moment you add a second instance behind a load balancer, instance A has no knowledge of requests hitting instance B. A client can send 100 requests to instance A and 100 requests to instance B before either triggers the limit. Effective rate limit: doubled. Redis solves this by providing shared state across all instances.
Install the required packages: @nestjs/throttler and @nest-lab/throttler-storage-redis. The throttler-storage-redis package implements the ThrottlerStorage interface using Redis as the backend. Configure ThrottlerModule in your AppModule with the Redis storage provider. The ThrottlerStorageRedisService accepts your Redis connection config (host, port, password). Register ThrottlerGuard globally using the APP_GUARD token in your providers array so every route is protected by default.
In-Memory Throttler (single instance — breaks at scale)
────────────────────────────────────────────────────────
Client ──► Instance A (Map: {clientIP: requestCount})
• Works fine for 1 instance
• Client sends to Instance B → counter resets → bypass!
Redis Throttler (multi-instance — production correct)
────────────────────────────────────────────────────────
Client ──► Instance A ──► Redis INCR("throttle:clientIP:ttl")
Client ──► Instance B ──► Redis INCR("throttle:clientIP:ttl") ← same key!
Client ──► Instance C ──► Redis INCR("throttle:clientIP:ttl") ← same key!
All instances share one counter → correct rate limiting at any scale
Request flow:
Client Request
│
▼
ThrottlerGuard.canActivate()
│
├── Get tracker (IP or userId)
├── Redis INCR(key) + set TTL if new
├── If count > limit → 429 Too Many Requests + Retry-After header
└── If count <= limit → continue to route handler
Three-tier config (what I use in production):
/auth/* → 5 req / 60s (brute-force protection)
/api/* → 100 req / 60s (standard API)
/export/* → 2 req / 300s (heavy operation)From implementing rate limiting on an ERP NestJS API: use multiple throttle configurations for different endpoint types. A public authentication endpoint (login, password reset) gets aggressive limits: 5 requests per minute. A general API endpoint gets relaxed limits: 100 requests per minute. A bulk export endpoint gets very strict limits: 2 requests per 5 minutes. Use the @Throttle() decorator on specific controllers or routes to override the global config. This gives you surgical control without duplicating the global guard setup.
The ThrottlerModule configuration in AppModule: import ThrottlerModule with an array of throttle configurations (each has a name, ttl in milliseconds, and limit). The storage option points to your ThrottlerStorageRedisService. Register APP_GUARD with ThrottlerGuard in the providers array. Every route in your application now enforces the rate limit. Endpoints that should bypass rate limiting (health checks, internal monitoring) use the @SkipThrottle() decorator.
# Install packages
npm install --save @nestjs/throttler @nest-lab/throttler-storage-redis ioredis
# app.module.ts
import { ThrottlerModule, ThrottlerGuard } from '@nestjs/throttler'
import { ThrottlerStorageRedisService } from '@nest-lab/throttler-storage-redis'
import { APP_GUARD } from '@nestjs/core'
@Module({
imports: [
ThrottlerModule.forRoot({
throttlers: [
{ name: 'short', ttl: 60000, limit: 5 }, // 5 req/min (auth routes)
{ name: 'medium', ttl: 60000, limit: 100 }, // 100 req/min (default)
{ name: 'long', ttl: 300000, limit: 2 }, // 2 req/5min (heavy ops)
],
storage: new ThrottlerStorageRedisService({
host: process.env.REDIS_HOST,
port: parseInt(process.env.REDIS_PORT ?? '6379'),
password: process.env.REDIS_PASSWORD,
}),
}),
],
providers: [
{ provide: APP_GUARD, useClass: ThrottlerGuard }, // global guard
],
})
export class AppModule {}
# Override per-controller: use specific throttle tier
@Controller('auth')
@Throttle({ short: { limit: 5, ttl: 60000 } }) // strict for auth
export class AuthController {
@Post('login')
login(@Body() dto: LoginDto) { ... }
}
# Skip for health checks:
@Get('health')
@SkipThrottle()
health() { return { status: 'ok' } }
# Custom tracker — use userId instead of IP for authenticated routes
@Injectable()
export class UserThrottlerGuard extends ThrottlerGuard {
protected async getTracker(req: Record<string, unknown>): Promise<string> {
const userId = (req as any).user?.sub
return userId ?? (req as any).ip // fallback to IP if not authenticated
}
}
# Custom exception filter — add Retry-After header
@Catch(ThrottlerException)
export class ThrottlerExceptionFilter implements ExceptionFilter {
catch(exception: ThrottlerException, host: ArgumentsHost) {
const ctx = host.switchToHttp()
const response = ctx.getResponse<Response>()
response
.status(429)
.setHeader('Retry-After', '60')
.json({
statusCode: 429,
message: 'Too Many Requests',
retryAfter: 60,
})
}
}The default throttler keys requests by IP address. For authenticated APIs, you want to key by user ID — one IP can represent a company's entire team. Override ThrottlerGuard's getTracker() method to return the user ID from the JWT payload if authenticated, falling back to IP for unauthenticated requests. This prevents the legitimate scenario where 20 people at the same company (same IP via NAT) trigger rate limits designed for individual users.
When a request is rejected with 429 Too Many Requests, always include the Retry-After header — it tells the client when they can retry. Without it, clients typically implement exponential backoff from scratch or retry immediately (worsening the problem). NestJS throttler adds X-RateLimit-Limit, X-RateLimit-Remaining, and X-RateLimit-Reset headers to successful responses automatically. For rejected requests, add a Retry-After header in your ThrottlerExceptionFilter to tell clients exactly when to retry. Good rate limiting is transparent to legitimate users and informative when limits are hit.
@nest-lab/throttler-storage-redis stores throttle state in Redis keys with the format throttle:<tracker>:<ttl>. Each key has an automatic TTL matching your throttle window. When the TTL expires, the key is deleted and the count resets. Redis's atomic INCR operation ensures concurrent requests from the same client don't race past the limit. The keys are lightweight — a throttle record is a single counter value. Even a high-traffic API with 10,000 concurrent users generates only 10,000 Redis keys with sub-millisecond read/write operations.
For any NestJS API serving more than one instance, Redis-backed throttling with @nestjs/throttler + @nest-lab/throttler-storage-redis is my standard. The setup takes less than an hour and provides meaningful protection against abuse. I configure three throttle tiers: strict for auth endpoints, standard for regular API, and lax for read-only public endpoints. I override the tracker to use user ID for authenticated routes. I add a custom exception filter to enrich 429 responses with human-readable retry information. This combination has prevented abuse incidents on every API I've deployed it to.
Rate limiting is one layer. Production API protection needs multiple: (1) Rate limiting (NestJS Throttler + Redis) — prevents request floods. (2) Input validation (class-validator + class-transformer pipes) — prevents malformed inputs. (3) Authentication guards (Passport JWT) — prevents unauthenticated access. (4) Authorization guards (CASL or custom) — prevents accessing resources you don't own. (5) SQL injection prevention (Prisma's parameterized queries). (6) Network-level DDoS protection (Cloudflare or equivalent). Rate limiting handles the traffic layer; the other layers handle the application layer.