What actually happens when you set --memory=512m in Docker — is it a soft limit or a hard limit?

Setting --memory=512m writes a hard limit into the memory.max file in the container's cgroup v2 directory. Docker does not enforce this limit itself; the Linux kernel does. When memory usage reaches the cap, the kernel first attempts to reclaim page cache, and if that is not enough, the OOM killer is invoked and your process is terminated with exit code 137.

Why does a container sometimes get OOM-killed even though the application heap looks fine?

Because memory.max in cgroup v2 counts all memory including page cache, not just the process heap. A container that reads large files can accumulate reclaimable cache that pushes total usage over the limit, causing the OOM killer to fire even though application-level memory metrics appear normal. Checking memory.stat inside the cgroup is the right first step when a limit fires unexpectedly.

Why does hitting a CPU limit throttle the container instead of killing it, unlike memory limits?

Memory is binary — a page either exists or it does not — so exceeding memory.max triggers the OOM killer. CPU time is a continuous resource that can be metered across scheduling periods, so the kernel simply makes the container wait when it exceeds its quota. The --cpus flag sets a bandwidth quota expressed as milliseconds of CPU time allowed per 100-millisecond window, and overuse shows up as throttling rather than as process termination.

How should you size memory limits for Node.js or JVM workloads to avoid OOM kills?

The recommended approach from the post is to first run the service without limits under realistic load and record the peak memory usage, then set --memory to roughly 1.5 times that peak. The runtime heap should be configured to about 75 percent of the container limit — for Node.js this means --max-old-space-size and for JVM it means -XX:MaxRAMPercentage — to leave headroom for thread stacks, native buffers, and page cache inside the same cgroup.

How can you detect CPU throttling before it causes user-facing latency problems?

The cpu.stat file inside the cgroup exposes nr_throttled and throttled_usec counters, and cAdvisor publishes container_cpu_cfs_throttled_periods_total to Prometheus. The post recommends alerting on throttling counters rather than on average CPU utilization, because a service can run at 40 percent average CPU and still suffer poor p99 latency if it regularly bursts into the quota ceiling on request spikes.

What cgroups v2 Really Does When You Set Docker Memory Limits

The first time one of my containers got OOM-killed in production, the logs told me nothing. The app just vanished, Docker restarted it, and the only evidence was a single line in dmesg about the kernel sacrificing a process. I had set --memory=512m believing it was a polite suggestion to Docker. It is not. It is a kernel-enforced hard limit written into a file, and the enforcement mechanism — control groups v2 — is worth understanding before it teaches you the hard way.

This post unpacks what actually happens on a modern Linux host when you set Docker memory and CPU limits: which cgroup files get written, what the kernel does at each threshold, why CPU limits throttle while memory limits kill, and how to size limits for Node.js and JVM workloads so the runtime and the kernel stop fighting each other. Everything here applies to any recent distro running cgroups v2 — the default since roughly 2021 on Ubuntu, Debian, and Fedora.

Limits Are Just Files: Seeing Through Docker's Abstraction

Control groups are the kernel's mechanism for partitioning resources — CPU time, memory, IO — among process trees. Docker does not implement resource limiting itself; when you pass --memory or --cpus, the daemon creates a cgroup for your container and writes your numbers into the corresponding interface files. The kernel does the rest.

You can watch this happen. Start a container with limits and read its cgroup directly:

# every docker run flag becomes a file in the cgroup tree
docker run -d --name api --memory=512m --cpus=1.5 myapp:latest

CID=$(docker inspect -f '{{.Id}}' api)
cat /sys/fs/cgroup/system.slice/docker-$CID.scope/memory.max
# 536870912        <- your --memory=512m, in bytes

cat /sys/fs/cgroup/system.slice/docker-$CID.scope/cpu.max
# 150000 100000    <- your --cpus=1.5: 150ms of CPU per 100ms window

That cpu.max line is the most useful mental model in this whole topic: --cpus=1.5 means the container's processes may consume at most 150 milliseconds of CPU time in every 100-millisecond window, across all cores combined. There is no core pinning involved — it is a bandwidth quota enforced by the scheduler, which is why the effects show up as throttling rather than as missing CPUs.

Memory: max, high, and Where the OOM Killer Lives

Cgroups v2 gives memory control several distinct knobs, and Docker's flags map onto them directly:

Docker flag	cgroup v2 file	Kernel behavior at the threshold
--memory	memory.max	Hard cap. The kernel first tries to reclaim pages; if usage cannot be brought down, the OOM killer is invoked inside the cgroup and your process dies with exit code 137.
(no direct flag)	memory.high	Soft ceiling. Processes are throttled and put under heavy reclaim pressure, but never OOM-killed. Orchestrators use this for graceful degradation before the hard limit.
--memory-swap	memory.swap.max	Controls swap on top of RAM. Setting --memory-swap equal to --memory disables swap for the container — usually what you want for latency-sensitive services.

The crucial subtlety: memory.max counts page cache, not just your process heap. A container that reads large files can show high memory usage that is actually reclaimable cache, and conversely your app can be OOM-killed while its heap looks fine because anonymous memory plus cache crossed the line together. When a limit seems to fire too early, check memory.stat inside the cgroup before blaming your app.

Exit code 137 with no application error is the OOM killer's signature. Confirm with docker inspect — OOMKilled true — and resist the temptation of --oom-kill-disable. Disabling the killer on a memory-capped container does not free memory; it deadlocks the container in permanent reclaim instead of restarting it cleanly.

CPU: Why Limits Throttle Instead of Kill

Memory is non-negotiable — a page either exists or it does not — so the kernel kills. CPU is time-sliced, so the kernel can simply make you wait. Three Docker flags cover the practical space:

--cpus is the quota: a hard bandwidth cap per scheduling period, the right tool for guaranteeing one noisy container cannot starve a VPS.
--cpu-shares is proportional weight, and it only matters under contention. A 2-share container gets twice the CPU of a 1-share neighbor when both want everything; on an idle host, both run uncapped.
--cpuset-cpus pins to specific cores — rarely worth it on small hosts, occasionally useful to isolate a latency-critical process from everything else.

Throttling is visible and measurable: cpu.stat inside the cgroup reports nr_throttled and throttled_usec counters. A web service can sit at 40 percent average CPU and still have terrible p99 latency because it bursts into its quota ceiling on every request spike and spends the rest of each period frozen.

Rule of thumb from my own dashboards: for latency-sensitive services, alert on throttling counters, not on CPU usage. cAdvisor exposes container_cpu_cfs_throttled_periods_total to Prometheus; sustained throttling above a few percent of periods means the quota is too tight even if average utilization looks comfortable.

What Changed from cgroups v1 — and Why You Should Care

If your runbooks or Stack Overflow answers date from the v1 era, three differences matter operationally:

One unified hierarchy instead of separate trees per controller. A container is one cgroup with all controllers attached, not five cgroups that can disagree — debugging got dramatically saner.
Hierarchical enforcement is strict: a child cgroup can never exceed its parent's limits. This is what makes nested limits — a systemd slice capping all of Docker, with per-container limits inside — actually trustworthy.
The pressure stall information (PSI) files — memory.pressure, cpu.pressure, io.pressure — exist per cgroup, giving you early-warning signals v1 never had. Rising memory pressure predicts an OOM kill before it happens.

Making Runtimes Respect the Limit

The kernel enforces the cap, but your runtime allocates against assumptions. Older runtimes read host memory and size their heaps accordingly — a JVM on a 32 GB host inside a 1 GB container would happily plan a multi-gigabyte heap, then die at 137. The fix is telling the runtime the truth:

# Node.js: heap limit must fit inside memory.max
docker run -d --memory=512m \
  -e NODE_OPTIONS="--max-old-space-size=384" myapp

# JVM: let it read the cgroup instead of guessing
docker run -d --memory=1g \
  -e JAVA_TOOL_OPTIONS="-XX:MaxRAMPercentage=75" myservice

Modern JVMs (10+) are container-aware by default and read the cgroup, but the percentage is still worth setting explicitly: the gap between heap limit and memory.max must hold thread stacks, metaspace or native buffers, and page cache. My defaults are 75 percent of the container limit for both Node's max-old-space-size and the JVM's MaxRAMPercentage, narrowing only after profiling.

A Sizing Procedure That Survives Production

How I set limits for a new service, in order:

Run it unlimited in staging under realistic load and record the high-water mark from docker stats or Prometheus, not a single spot check.
Set --memory to roughly 1.5 times the observed peak, and set the runtime heap to about 75 percent of that limit.
Set --cpus to cover the p99 burst, not the average — quota throttling on a web service hurts users at exactly the busiest moments.
Add Prometheus alerts on container_oom_events and throttling counters from day one, so limits failures page you instead of surprising you.
Re-measure quarterly. Memory footprints drift upward with every dependency bump, and the limit that fit in January OOMs in June.

On Docker Swarm the same kernel mechanics apply through the deploy.resources block: limits become the cgroup caps described here, and reservations drive the scheduler's placement decisions. Setting reservations honestly is what stops Swarm from packing three memory-hungry services onto one 4 GB node and letting cgroups referee the resulting fight.

The Takeaway

Docker resource flags stop being mysterious once you see them as what they are: numbers in cgroup v2 interface files, enforced by a kernel with very predictable rules. Memory crosses memory.max and something dies; CPU crosses cpu.max and something waits. Size the limits from measurements, make your runtime agree with the kernel about how much memory exists, and alert on OOM kills and throttling rather than on raw usage. Containers without limits are not generous — they are just deferring the negotiation to the worst possible moment.

Sources and further reading

Frequently Asked Questions

What cgroups v2 Really Does When You Set Docker Memory Limits

Frequently Asked Questions

What cgroups v2 Really Does When You Set Docker Memory Limits

Limits Are Just Files: Seeing Through Docker's Abstraction

Memory: max, high, and Where the OOM Killer Lives

CPU: Why Limits Throttle Instead of Kill

What Changed from cgroups v1 — and Why You Should Care

Making Runtimes Respect the Limit

A Sizing Procedure That Survives Production

The Takeaway

Limits Are Just Files: Seeing Through Docker's Abstraction

Memory: max, high, and Where the OOM Killer Lives

CPU: Why Limits Throttle Instead of Kill

What Changed from cgroups v1 — and Why You Should Care

Making Runtimes Respect the Limit

A Sizing Procedure That Survives Production

The Takeaway