Latency vs throughput

Latency is the time for one request to complete; throughput is how many requests complete per unit time — related but independently optimizable.

Why it matters

You design for both, and they trade off. Batching and queuing raise throughput but add latency; replicating for low latency can cap throughput per node. SLOs are usually written on tail latency (p99/p99.9), not the average, because at scale the slow tail is what users actually hit — a request that fans out to 100 services sees the slowest of 100, so a rare 1-in-100 stall becomes the common case.

How it works

Think of a pipe: latency is its length, throughput is its width. Little’s Law ties them: L = λ × W — concurrent requests in flight = arrival rate × latency. A useful corollary: with bounded concurrency, lower latency directly buys higher throughput.

Report percentiles, never just the mean — p50/p95/p99/p99.9. One GC pause skews the mean but shows up honestly at the tail.
Batching/pipelining amortizes fixed cost (round trips, fsync) across many items → throughput up, per-item latency up.
A queue (message-queues) decouples producer/consumer rate; under overload it must shed or apply back-pressure or latency grows unbounded.

Lever	Latency	Throughput
Batch writes	worse	better
Add replica/shard	same/better	better
Bigger queue	worse	hides spikes

A rough latency ladder worth memorizing: L1 ~1 ns, main memory ~100 ns, SSD read ~100 µs, intra-DC round trip ~0.5 ms, cross-continent ~150 ms.

Example

A logging service must absorb 1M events/s. Writing each individually fsyncs per event (~1 ms each) and collapses. Batching 10K events per flush raises throughput past 1M/s while adding only ~10 ms of buffering latency — a deliberate latency-for-throughput trade users never notice on a log.

Pitfalls

Tuning the mean — a great average can hide a brutal p99.9.
Unbounded queues trading a fast failure for slowly rising latency until OOM; bound them and apply back-pressure.
Ignoring coordinated omission — load tools that pause during stalls under-report tail latency.

tech-studies

Explorer

Latency vs throughput

Latency vs throughput

Why it matters

How it works

Example

Pitfalls

See also

Graph View

Table of Contents

Backlinks