What are the three states of a circuit breaker and how do they work?

A circuit breaker cycles through Closed, Open, and Half-Open states. In Closed state, requests flow normally and failures are counted; once failures exceed the configured threshold within a time window, the breaker trips to Open. In Open state, all requests are immediately rejected without calling the failing service; after a configured wait duration, the breaker moves to Half-Open, where a limited number of test requests are allowed through to check if the downstream service has recovered.

Which libraries does the post recommend for implementing circuit breakers in NestJS?

The post recommends two options: nestjs-resilience4j, which wraps the Java Resilience4j library's concepts in a NestJS-native API and provides the most complete implementation, and cockatiel, a lightweight TypeScript-native library that offers Circuit Breaker, Retry, and Timeout policies that compose well together. The choice depends on how much weight you want to add and whether you prefer a Java-ecosystem-inspired API or a TypeScript-first approach.

What are the key configuration numbers for a circuit breaker and how should they be tuned?

Three numbers matter most: the failure rate threshold (the percentage of failures that trips the breaker, typically 50%), the wait duration in Open state (how long before attempting Half-Open, typically 10–30 seconds), and the number of permitted calls in Half-Open state (how many test requests to allow, typically 3–5). The post stresses that these should be derived from measurement of your service's actual failure rate and response time distribution — set too sensitively and transient errors trip the breaker unnecessarily; set too leniently and cascading failures propagate before the breaker activates.

How do retries interact with circuit breakers, and what pitfall should you watch for?

The recommended composition is Timeout → Retry → Circuit Breaker, so timeouts and retries handle transient errors before the circuit breaker counts them. The key pitfall is that retries amplify failure signals: if you retry 3 times per call and the failure threshold is 5, three failed requests generate 9 failure events. You must tune your retry count and circuit breaker threshold together to avoid tripping the breaker on what is actually a small number of real failures.

What fallback strategies are available when a circuit breaker rejects a request?

The post outlines four options based on business criticality: return a cached response (the last successful result, good for read-only data); return a safe default such as an empty list or zero balance; return a degraded response with partial data from an alternative source; or return a structured error response such as a 503 with a Retry-After header. The post explicitly warns against silent success returns that hide the failure from the caller, as these make debugging much harder.

Circuit Breaker Pattern in NestJS: Building Resilient Microservices

Without circuit breakers, cascading failures can take down your entire platform in under 90 seconds. Michael Nygard popularized the circuit breaker pattern in his book Release It! (2018) to prevent exactly this scenario: when Service A depends on Service B, and B starts timing out, A's threads pile up waiting for responses, exhausting its connection pool, which causes A to fail, which cascades to Service C that depends on A. The circuit breaker pattern breaks this chain by detecting when a dependency is struggling and short-circuiting calls to it — returning an immediate error or a cached fallback rather than waiting for a timeout.

How the Circuit Breaker Works

The circuit breaker is a state machine with three states. Closed (normal operation): requests flow through, failures are counted. If failures exceed a threshold within a time window, the breaker trips to Open. Open (failure mode): all requests are immediately rejected with an error — no calls are made to the failing service. After a configured timeout, the breaker moves to Half-Open. Half-Open (probe): a limited number of test requests are allowed through. If they succeed, the breaker closes. If they fail, it returns to Open. This self-healing behavior is what makes the circuit breaker valuable — it automatically recovers when the downstream service recovers.

Configuring Thresholds

Circuit breaker configuration requires three key numbers: failure rate threshold (percentage of failures that trips the breaker — typically 50%), wait duration in open state (how long to wait before trying half-open — typically 10-30 seconds), and permitted calls in half-open state (how many test requests to allow — typically 3-5). Getting these numbers right for your workload requires measurement: check your service's normal failure rate and response time distribution before setting thresholds. Too sensitive, and transient errors trip the breaker unnecessarily. Too lenient, and cascading failures propagate before the breaker activates.

Circuit Breaker State Machine

                 Failure rate > 50%
  ┌──────────┐ ───────────────────► ┌──────────┐
  │  CLOSED  │                      │   OPEN   │
  │(normal)  │ ◄─────────────────── │(rejected)│
  └──────────┘  Success in half-open└────┬─────┘
       │                                 │
       │ Requests pass through           │ All requests rejected
       │ Failures counted                │ No calls to downstream
       │                                 │
       │                          After 30s timeout
       │                                 │
       │                          ┌──────▼──────┐
       │ ◄─ 5 test requests ok ── │  HALF-OPEN  │
       │    2 fail → back to Open │  (probing)  │
                                  └─────────────┘

  Cascading Failure WITHOUT Circuit Breaker:
  Service A times out (30s) → thread pool exhausted
  → Service B calls A → B times out → B thread pool exhausted
  → Service C calls B → C fails
  ENTIRE PLATFORM DOWN in < 90 seconds

  WITH Circuit Breaker:
  A fails → CB opens → B gets immediate error → B uses fallback
  → C continues normally with cached/degraded response

From building the multi-level approval workflow in Commsult's ERP: wrap external service calls (email provider, PDF generator, payment gateway) with circuit breakers, but not internal service calls that share the same database. The circuit breaker is for protecting against network-level failures to external or remote dependencies. Internal module calls that fail are typically programming errors — they should surface immediately as errors, not be circuit-broken.

Implementing Circuit Breakers in NestJS

The most complete circuit breaker implementation for NestJS comes from nestjs-resilience4j, which wraps the Java Resilience4j library's concepts in a NestJS-native API. Alternatively, the cockatiel library provides a lightweight TypeScript-native implementation with Circuit Breaker, Retry, and Timeout policies that compose well. For a NestJS service calling an external HTTP API, wrap the call with a CircuitBreaker policy — the library handles state tracking, error counting, and timeout logic. Expose the circuit state via a health check endpoint so your load balancer can route around an open-circuit service instance.

// Using 'cockatiel' — lightweight TypeScript circuit breaker
import { CircuitBreakerPolicy, timeout, TimeoutStrategy, retry, ExponentialBackoff } from 'cockatiel';

// Compose: Timeout → Retry → Circuit Breaker
const circuitBreaker = new CircuitBreakerPolicy({
  halfOpenAfter: 30_000,          // 30s before probing
  breaker: consecutivelyFailed(5), // open after 5 consecutive failures
});

const retryPolicy = retry(handleAll, {
  maxAttempts: 3,
  backoff: new ExponentialBackoff(),
});

const timeoutPolicy = timeout(5_000, TimeoutStrategy.Cooperative); // 5s

// email.service.ts — wrapping external SMTP API
@Injectable()
export class EmailService {
  private policy = circuitBreaker.wrap(retryPolicy.wrap(timeoutPolicy));

  async sendInvoice(data: InvoiceEmailData): Promise<void> {
    try {
      await this.policy.execute(() =>
        this.smtpClient.send({
          to: data.recipientEmail,
          subject: 'Invoice Ready',
          html: this.templateService.render('invoice', data),
        })
      );
    } catch (error) {
      // Circuit is open OR all retries failed
      // Log and queue for later retry via BullMQ
      this.logger.error('Email send failed, queuing for retry', { error, data });
      await this.retryQueue.add('email:retry', data, { delay: 60_000 });
    }
  }
}

// Health endpoint — expose circuit state
@Get('/health/circuits')
getCircuitHealth() {
  return {
    emailService: circuitBreaker.state,   // 'closed' | 'open' | 'half-open'
    pdfService: pdfCircuitBreaker.state,
    paymentGateway: paymentCircuitBreaker.state,
  };
}

Combining with Retry and Timeout

Circuit breakers work best in combination with retry policies and timeouts. The typical composition: Timeout (fail fast if response takes too long) → Retry (retry on transient errors) → Circuit Breaker (trip if too many retries fail). In code, wrap inner policies with outer: circuitBreaker.execute(() => retry.execute(() => timeout.execute(() => apiCall()))). Be careful with retry + circuit breaker interaction: retries amplify failure signals. If you retry 3 times on every call and have a failure threshold of 5, three failed requests count as 9 failure events. Tune your retry count and circuit breaker threshold together.

An open circuit breaker means something is wrong downstream — but it also means your service is silently dropping requests. Without monitoring, you'll discover an open circuit breaker when a user complains their request 'just returns an error instantly.' Instrument every circuit breaker with metrics: track state transitions (closed→open, open→half-open→closed), rejection count, and the error rate that triggered the trip. Alert when a circuit breaker transitions to Open state. In NestJS, expose circuit breaker state via a /health endpoint that your monitoring system polls.

Fallback Strategies

When the circuit breaker rejects a request, what should your service return? The fallback strategy depends on the business criticality of the downstream call. Options: (1) Cached response — return the last successful response from a cache (good for read-only data that doesn't change rapidly); (2) Default response — return a safe default (empty list, zero balance); (3) Degraded response — return partial data from a different, available source; (4) Error response — return a structured error with context (503 Service Unavailable with Retry-After header). Avoid silent success returns (returning an empty success that hides the failure from the caller).

Circuit Breakers in the API Gateway

One powerful deployment is putting circuit breakers in your API gateway — the gateway tracks the health of each downstream service and trips the breaker when a service's error rate spikes. This gives you protection at the perimeter without requiring every service to implement its own breakers. Kong and AWS API Gateway have circuit breaker plugins. In a NestJS gateway, implement it using an interceptor that tracks response codes and response times per upstream service, tripping a per-service circuit when thresholds are crossed. This way, a degraded service doesn't affect your gateway's overall response time.

Testing Your Circuit Breakers

Circuit breakers are not useful if you can't verify they work. Write integration tests that inject failures into the downstream dependency (mock the HTTP call to throw errors) and verify: (1) the breaker transitions to Open after the threshold; (2) requests are rejected immediately in Open state; (3) the breaker transitions to Half-Open after the wait duration; (4) the breaker closes after successful test requests in Half-Open. Also chaos test in staging — use a tool like Chaos Monkey or simple firewall rules to make a dependency unavailable and verify your circuit breakers activate and your system degrades gracefully rather than failing completely.

Sources & Further Reading

Martin Fowler — Circuit Breaker (bliki) — https://martinfowler.com/bliki/CircuitBreaker.html
Michael Nygard — Release It! Design and Deploy Production-Ready Software (Pragmatic Bookshelf, 2018)
AWS — Circuit Breaker Pattern — https://docs.aws.amazon.com/prescriptive-guidance/latest/cloud-design-patterns/circuit-breaker.html

Frequently Asked Questions

Circuit Breaker Pattern in NestJS: Building Resilient Microservices

Frequently Asked Questions

Circuit Breaker Pattern in NestJS: Building Resilient Microservices

How the Circuit Breaker Works

Configuring Thresholds

Implementing Circuit Breakers in NestJS

Combining with Retry and Timeout

Fallback Strategies

Circuit Breakers in the API Gateway

Testing Your Circuit Breakers

Sources & Further Reading

How the Circuit Breaker Works

Configuring Thresholds

Implementing Circuit Breakers in NestJS

Combining with Retry and Timeout

Fallback Strategies

Circuit Breakers in the API Gateway

Testing Your Circuit Breakers

Sources & Further Reading