Module 5 — Making It Survive Load — FM India

Your app is correct. Now make it survive being popular. This module applies System Design and the distributed rate limiter deep dive. We add three things, each targeting a different failure under load: a rate limiter (abuse), a cache (hot reads), and a background job (bursty writes).

Goal

A rate limiter on write endpoints, backed by Redis so it works across multiple instances.
A cache on the hot read (GET /s/:publicId for public snippets), with correct invalidation.
The view-count increment moved off the request path into a background job.

Step 0: Add Redis

docker run --name snippets-redis -p 6379:6379 -d redis:7

Redis is the right tool here for the reason the data chapter gives: it's an in-memory store, so the central counter and the cache are microsecond operations, and it's shared across all your app instances — which is the whole point.

Step 1: A distributed rate limiter

A per-process in-memory counter is useless the moment you run two instances — each sees only its own slice of a client's traffic, so the real limit is N × your_limit. The deep dive's lesson: the counter must be central, and the increment must be atomic.

Key by client identity + window
ratelimit:{userId}:{minute}. One key per user per time window.
Increment atomically, set expiry on first hit
INCR returns the new count atomically — no check-then-set race even under concurrent requests. On the first increment, set the key to expire at the window's end so it cleans itself up.
Reject over the limit
If the count exceeds the limit, return 429 Too Many Requests with a Retry-After header.

async function rateLimit(userId: number, limit = 60) {
  const key = `ratelimit:${userId}:${Math.floor(Date.now() / 60000)}`;
  const count = await redis.incr(key);
  if (count === 1) await redis.expire(key, 60); // window TTL on first hit
  if (count > limit) throw new HttpError(429, "rate_limited");
}

Why INCR and not GET-then-SET

The naive limiter reads the count, adds one, writes it back — and two concurrent requests both read the same value, both write count+1, and the real count is wrong, letting a burst through. INCR does read-modify-write as one atomic server-side operation, so the count is always right no matter how many requests race. This is the same "claim it atomically, don't check-then-act" idea as idempotency keys, wearing a different hat.

DecisionFail open if Redis is down, for this service.

The deep dive's fail-open vs fail-closed question, decided for our context: if Redis is unreachable, do we block all writes (fail closed, safe but the rate limiter outage becomes a full outage) or allow them unlimited (fail open, available but briefly unprotected)? For a snippets service, a brief loss of rate limiting is far less bad than a total write outage, so we fail open and alert. A payments system would choose the opposite. The point is that this is a deliberate decision tied to what the service can tolerate, not a default.

Step 2: Cache the hot read

When a snippet gets shared widely, GET /s/:publicId is hit thousands of times for the same unchanging data, each time querying Postgres. That's wasteful and it's how a viral link takes down your database. Cache it.

app.get("/s/:publicId", async (c) => {
  const id = c.req.param("publicId");
  const cacheKey = `snippet:${id}`;

  const cached = await redis.get(cacheKey);
  if (cached) {
    enqueueViewIncrement(id);              // still count the view (Step 3)
    return c.json(JSON.parse(cached));
  }

  const snippet = await getByPublicId(id);
  if (!snippet || !snippet.is_public) return c.json({ error: "not_found" }, 404);

  const response = toResponse(snippet);
  await redis.set(cacheKey, JSON.stringify(response), "EX", 300); // 5-min TTL
  enqueueViewIncrement(id);
  return c.json(response);
});

Cache invalidation: the part that causes bugs

A cache that's never invalidated serves stale data. When a snippet is edited or deleted, you must delete its cache key (redis.del(\snippet:$`)`) in the same handler, or readers keep seeing the old version for up to the TTL. The TTL is your safety net for things you forget to invalidate (and for cache entries on instances you can't reach), not your primary strategy. The two-line rule: cache on read, invalidate on write. We only cache public snippets — private ones are per-user and not hot, so they're not worth the invalidation risk.

Step 3: Move the write off the request path

Notice the view counter. Incrementing view_count on every read means every read is now also a write to Postgres — and a hot snippet generates a storm of writes to one row, which contend with each other and slow the read the user is waiting on. The fix: the user's read shouldn't wait on the counter at all. Increment it asynchronously.

Synchronous (what we're fixing)

Every GET does an UPDATE snippets SET view_count = view_count + 1. Reads block on a write to a single hot row; under load those writes serialize and everyone waits.

Asynchronous (the fix)

The GET returns immediately and drops a "saw a view" message on a queue. A background worker batches them — e.g. flush counts every few seconds — and does one UPDATE per snippet per batch instead of one per view. The read path never touches the counter write.

For this build, a lightweight queue is enough: push view events to a Redis list (or increment a Redis counter views:{id}), and run a small worker on an interval that drains them into Postgres in a batch.

// fire-and-forget on the read path
function enqueueViewIncrement(publicId: string) {
  redis.incr(`views:${publicId}`).catch(() => {}); // best-effort, never blocks the read
}

// background worker, every 10s: drain the Redis counters into Postgres in one batch
async function flushViews() {
  const keys = await redis.keys("views:*");
  for (const key of keys) {
    const n = Number(await redis.getdel(key));   // read + clear atomically
    if (n > 0) await pool.query(
      `update snippets set view_count = view_count + $1 where public_id = $2`,
      [n, key.slice("views:".length)]
    );
  }
}

At-least-once, and why approximate counts are fine here

This is the queue model from the idempotency dive: the worker might run twice or miss a flush on a crash, so counts are approximately right, not exact. For a view counter that's completely acceptable — nobody cares if it says 9,998 vs 10,000. Decide per metric: a view count can be approximate and async; a money balance (the UPI ledger dive) cannot, and must be synchronous and exact. Knowing which is which is the senior judgment.

Acceptance check

# rate limit: hammer create past the limit → eventually 429
for i in $(seq 1 70); do
  curl -s -o /dev/null -w "%{http_code}\n" -b jar.txt \
    -XPOST localhost:3000/snippets -H "Idempotency-Key: $(uuidgen)" -d '{"body":"x"}'
done | sort | uniq -c           # you should see some 201s and then 429s

# cache: GET a public snippet twice; the second is served from Redis
# (log a cache hit/miss to see it). Then edit it → next GET reflects the change.

# views: GET a snippet several times; within ~10s view_count climbs via the worker,
# and the GET latency didn't include the counter write.

You're done when writes get rate-limited to 429 past the threshold, a hot public read is served from cache and correctly reflects an edit (invalidation works), and the view count rises via the background worker without slowing the read. Commit it.

What you just internalised

Surviving load is three separate disciplines, each cheap: a central, atomic rate limiter protects you from abuse and runaway clients; a cache with invalidate-on-write keeps hot reads off the database; and moving bursty or slow writes onto a queue keeps the request path fast and lets you batch. The cross-cutting skill is matching each choice to what the data can tolerate — approximate-and-async for views, exact-and-sync for money.