The "Stop-the-World" Problem: How I Rebuilt My Database's Core to Kill 200ms Latency Spikes
By Pranit Garje | github.com/pranitgarje
There’s a particular kind of failure that doesn’t announce itself. It doesn’t throw an exception or crash your server. It just makes your system feel slow — but only sometimes. Unpredictably. At the worst possible moment.
That’s the failure I hit when building Aphelo, my from-scratch, Redis-like in-memory database written in C++.
And the cause was hiding inside the most fundamental operation a hash table performs: resizing.
The Seductive Lie of O(1)
When I first replaced std::map with a hash table, everything felt right. Point lookups dropped from O(log N) to O(1). Throughput numbers looked great on paper.
But there’s a clause buried in that O(1) guarantee that most tutorials skim over: it holds only until the load factor is exceeded.
Every hash table has a threshold — a ratio of entries to bucket slots. Once you cross it, the table must resize: typically doubling its bucket array and rehashing every single existing key into the new layout. On a small table with a few thousand entries, this is invisible. On a production cache holding millions of keys, this is an O(N) operation executing on your hot path, blocking every single request until it finishes.
For a seed-stage startup running a customer-facing API, that pause is the difference between a smooth experience and a timed-out request. It’s not a theoretical concern. It’s a 200ms freeze that kills user trust.
The Architecture That Fixes It: Progressive Rehashing
The solution I implemented in Aphelo is called progressive rehashing, and the key insight is deceptively simple: don’t do all the work at once.
Instead of one catastrophic O(N) migration, Aphelo’s HMap manager holds two hash tables simultaneously — newer and older. When the load factor is exceeded, a resize is triggered but nothing is moved yet. The expensive migration work is then distributed across subsequent operations.
Every GET, SET, or DEL command that comes in does a small fixed chunk of migration work alongside its normal job — moving a constant number of buckets (say, 128) from older into newer. Over hundreds of subsequent requests, the migration completes gradually, invisibly, in the background.
┌─────────────────────────────────────────────────────────────┐
│ HMap │
│ │
│ ┌─────────────────────┐ ┌─────────────────────────┐ │
│ │ newer │ │ older │ │
│ │ (growing) │◄───│ (draining, chunk/tick) │ │
│ │ │ │ │ │
│ │ [■][■][■][■][■] │ │ [■][■][·][·][·] │ │
│ │ [■][■][■][■][■] │ │ (buckets migrate ──►) │ │
│ └─────────────────────┘ └─────────────────────────┘ │
│ │
│ All WRITES ──────────────► newer │
│ All READS ── check newer first, then older │
│ Once older is empty ── free it. Resize complete. │
└─────────────────────────────────────────────────────────────┘
Every GET/SET/DEL migrates ~128 buckets as a side-effect.
The O(N) cost is amortized invisibly across O(N) operations.Lookups during migration check both tables. Writes always go to newer. Once older is empty, it’s freed and the cycle is complete.
The cost of O(N) rehashing hasn’t disappeared — it’s been amortized across O(N) future operations, each paying a tiny, fixed, predictable fee.
Why This Matters Beyond Databases
The “stop-the-world” problem isn’t unique to hash tables. It’s a pattern that appears everywhere in systems design:
Garbage collectors that pause all threads to collect memory
Log compaction in databases that freezes writes
Rebalancing in distributed systems that stalls new requests
In every case, the fix follows the same principle Aphelo uses: convert a single large blocking operation into many tiny incremental ones. Trade peak cost for flat, predictable cost.
When you’re building infrastructure — whether it’s a cache, a message queue, or a storage engine — tail latency (P99, P999) is the only number that matters to the user. Median throughput is a vanity metric. The worst-case experience defines your product’s reputation.
The Result
After implementing progressive rehashing, Aphelo’s SET throughput hit 172,716 requests per second on consumer hardware with no latency spikes during resizing. The hash table resizes are now completely invisible to connected clients.
That’s not just a benchmark. It’s proof that architectural discipline at the data structure level produces compounding returns at the system level.
Design Principle: Avoid “Stop-the-World” events at all costs. If a task is O(N), don’t do it on the hot path — amortize it.
Pranit Garje is a software builder who built Aphelo — a high-performance, Redis-like in-memory database from scratch in C++. Explore the full source at github.com/pranitgarje.
If you're building pre-seed or seed-stage infrastructure and care about tail latency, I’d love to chat. Check out Aphelo on GitHub.
Next in the series: The Context-Switching Tax — why threads don’t scale and how a single-threaded event loop outperforms them.
