Surviving the 10 AM Ticket Rush

Every morning at 10:00 AM, IRCTC's Tatkal window opens (10 for AC, 11 for sleeper) and millions of people try to grab a few thousand last-minute seats in the same minute. This is the mirror image of live streaming. There, everyone reads the same data, and caching saves you. Here, everyone writes to the same tiny set of rows, and caching can't help you at all, because the one thing you must guarantee is that two people never walk away with the same seat.

This isn't a hypothetical. IRCTC has publicly reported peaks above 30,000 confirmed bookings in a single minute, with a stated upgrade target of around 1.5 lakh bookings/min and 40 lakh enquiries/min, roughly ten times its earlier capacity. The numbers are eye-watering precisely because demand far outstrips the seats, so the system's real job is to fail most people quickly and correctly while protecting the lucky few.

What we're building

Functional · what it does

Search trains and see live seat availability
Hold a seat while the user pays
Confirm the booking on successful payment
Release the hold if payment fails or times out
Show a fair queue position during the rush

Non-functional · what it must survive

Never oversell a seat, ever
Survive millions of attempts in a 60-second window
A failed payment must not lose the user's money
Stay responsive (or honestly say "busy") under load
Be fair: first come should mean first served

The defining constraint is correctness under contention. A streaming glitch annoys someone. A double-booked seat is a person standing in a train with a valid ticket and no seat, and a refund fight. Correctness wins over throughput here, every time.

The shape of the load

~10:00:00

The cliff

near-zero, then everything

Millions

Attempts / min

for thousands of seats

1000:1+

Demand : supply

most users will fail

Hot rows

The bottleneck

one train, one quota

Two facts shape everything. First, the contention is concentrated on a handful of rows: one popular train's seat inventory. Second, the vast majority of attempts must fail, because there simply aren't enough seats. A good design fails the losers fast and cheaply, and protects the few winners' transactions absolutely.

The core problem: don't oversell

Naively, booking looks like "read available count, if greater than zero subtract one." That's a classic race. Two requests both read "1 seat left," both think they won, both subtract, and now you've sold the same seat twice.

There are two correct ways to handle this, and you should know both.

Pessimistic locking

Lock the row before touching it: SELECT ... FOR UPDATE. Other writers wait. Reliable and simple to reason about, but the lock serialises everyone through one row, which limits throughput and can pile up waiters under a rush.

Optimistic / atomic update

Don't lock ahead of time. Do the decrement as one conditional statement and let the database's row lock be the gate: UPDATE inventory SET available = available - 1 WHERE id = ? AND available >= 1. If it updates zero rows, you lost. Short locks, high throughput.

That conditional UPDATE is the single most important line in the whole system. The WHERE available >= 1 guard makes overselling physically impossible: the database holds a brief row lock for each statement, so the checks are serialised and only available-many of them can succeed. Here's the idea in code, including the gap where a naive version goes wrong.

Hand out limited seats without oversellingrun · edit · saved to you

Loading editor…

Let the database be the referee

Don't try to coordinate seat allocation in your application code with counters in memory across many servers. That's a distributed-consensus problem you'll get wrong. Push the decision into a single atomic statement on a single row and let the database's locking do the hard part. The database is built for exactly this.

Hold, then confirm: the two-phase booking

A user needs time to pay, but you can't give away the seat for free while they fumble with a payment PIN, and you can't let them hold it forever. So booking is two phases:

Hold
The atomic decrement reserves a seat and creates a hold with a short expiry (a few minutes). The seat is now neither available nor confirmed.
Pay
The user completes payment. This is a separate, slow, external step that can fail, time out, or succeed after the user gave up.
Confirm or release
On success, the hold becomes a confirmed booking. On failure or expiry, a reaper releases the hold and returns the seat to the pool.

The expiring hold is what keeps the system honest. A user who abandons checkout doesn't lock a seat forever, and a seat is never given to two people because it's removed from the available pool the instant it's held. The reaper that releases expired holds is not optional; it's the safety valve that prevents inventory from slowly leaking away into dead holds.

The expiry is also why many real systems put the hold layer in Redis rather than the main database: an atomic SET key value NX EX 300 (or a small Lua script) claims a seat and stamps a five-minute TTL in one operation, and the TTL handles release automatically if no reaper gets there first. The durable confirmation still lands in the database, but the hot, short-lived hold lives in the fast store.

Payments must be idempotent

Payment is the riskiest step because it's slow and external. The user's network drops after they pay but before they see confirmation. They retry. Now you risk charging twice or, worse, taking their money and confirming nothing.

The fix is the idempotency key from the API chapter: every booking attempt carries a unique key, the payment is recorded against it, and a retry with the same key returns the original result instead of charging again. Pair this with a clear state machine for each booking (held → paid → confirmed, or held → expired → released) so that no matter how many times a request is retried or how it interleaves with the reaper, the booking lands in exactly one valid state.

The money-but-no-seat failure

The nightmare case: payment succeeds but confirmation fails (the app crashed, the hold expired in between). The user is charged with no ticket. You must reconcile this. Either confirm-after-pay is driven by a durable workflow that retries until the booking is confirmed or the payment is refunded, or a reconciliation job continuously matches successful payments against confirmed bookings and auto-refunds the orphans. Never leave it to chance.

Handling the 10:00 AM cliff

Correctness handles double-booking. It does nothing for the stampede of millions of requests hitting your servers in one second. For that you need to control how many requests even reach the booking logic.

The answer is a virtual waiting room. When the rush hits, you don't try to serve everyone at once. You admit users into the actual booking system at a controlled rate and hold the rest in a queue with a visible position. This does three things: it protects the database from a load it can't survive, it makes failure honest ("you're number 40,000 in line") instead of a spinning page, and it preserves fairness by admitting roughly in arrival order.

The crucial detail is that the waiting room runs on separate, cheap infrastructure from the booking core. Its only job is to hold a huge crowd on a lightweight page and trickle them through, so the origin and database never see the full mob. Commercial waiting rooms admit only a few hundred shoppers into the real system at a time, pacing the outflow so checkout and payment errors stay low. Two queue styles exist: strict first-in-first-out for continuous traffic, and randomised ("raffle") admission for a scheduled drop, so that hammering refresh at the exact opening second gives no advantage.

What happens without one: a real cautionary tale

A major concert-ticketing platform once opened a high-demand presale having provisioned for about 1.5 million verified buyers. Around 14 million people (plus bots) showed up and the system took roughly 3.5 billion requests — several times its prior peak. The site froze. The lesson isn't "buy more servers"; it's that when demand can be 10x your estimate, the only safe move is a gate in front that admits a survivable rate and is honest with everyone else.

DecisionPut a queue in front and admit users at a sustainable rate.

Letting all the load hit the database means it falls over and nobody gets a seat, including the people who would have won. A queue that admits, say, a few thousand users a second keeps the core system inside its safe operating range, so the seats that exist actually get sold. The cost is that most users wait and then learn they didn't get a seat. That's the honest truth of 1000:1 demand, surfaced instead of hidden.

You also push as much rejection as far forward as possible. If a train is sold out, that fact can be cached and served from the edge, so "sold out" requests never reach the booking core at all. Only requests that might actually succeed should spend the expensive resource, which is a transaction on a hot inventory row.

Half the crowd is robots

There's an adversary most tutorials skip: bots. Scalpers and agents script the booking flow to grab seats in bulk the instant the window opens. Operators have reported that in the first five minutes of a rush, up to half of all login attempts are automated, and that over a year one platform deactivated on the order of 30 million suspicious accounts and blocked tens of billions of bot requests. Every bot that gets through both steals a seat from a real person and burns a transaction on your hottest rows.

The defences stack up, cheapest first: a CDN / edge layer that blocks known data-centre IPs and obvious abuse before it reaches you, rate limits per account and per IP, proof-of-work or CAPTCHA challenges at the gate, and — the big one — strong identity at signup (phone/government-ID with OTP), which makes creating thousands of throwaway accounts expensive. The goal isn't to win outright; it's to make automated bulk-booking costly enough that humans get a fair shot.

Where the data lives

The inventory rows are the hottest, most contended data in the system. A few design notes:

Shard by train and date. Different trains are independent, so their inventory can live on different shards. This spreads the contention across the fleet instead of concentrating every train on one database. The contention within one popular train is irreducible, but at least one hot train doesn't slow bookings for every other.
Keep the hot transaction tiny. The hold transaction touches one inventory row and one holds row, and does nothing else (no emails, no analytics, no third-party calls). Everything non-essential happens afterward, off a queue. A short transaction holds its locks briefly, which is exactly what you want when thousands are queued behind it.
Read availability from a cache, write through the database. The "seats available" number shown during browsing can come from a slightly stale cache; correctness is enforced only at the moment of the atomic decrement. Don't make every availability check a hot-row read.
Watch the connection pool, not just the database. A large commerce platform that rebuilt its reservation system for a flash-sale load found the real bottleneck wasn't the database's raw speed — it was connection-pool exhaustion, thousands of requests all waiting for a free database connection. Their fix leaned on a fixed pool, one row per sellable unit, and SELECT ... FOR UPDATE SKIP LOCKED so workers grab different free units instead of all queueing on the same locked row. The lesson: under extreme contention, the limit you hit first is often the number of in-flight connections, so the queue out front is what keeps that number sane.

The one idea to take away

Booking is the opposite of streaming. You can't cache the write, so you make the write as small, atomic, and serialised as possible, and you protect it with a queue so only a survivable number of requests reach it. Correctness lives in one conditional UPDATE; everything else (holds, payments, queueing) exists to feed that one statement safely.

Test yourself

Questions· say the answer out loud before you open it. If you can't, the chapter isn't done.

QTwo users see '1 seat left' and both click book. How do you guarantee only one succeeds?+

Make the decrement atomic and conditional: UPDATE inventory SET available = available - 1 WHERE id = ? AND available >= 1. The database holds a brief row lock per statement, so the two updates are serialised, and only the one that runs while available >= 1 succeeds. The loser's update affects zero rows and is told the seat is gone. Never read-then-write in application code.

QWhy hold a seat instead of booking it directly on click?+

Because payment is slow and can fail. A hold removes the seat from the available pool immediately (so it can't be double-sold) but doesn't confirm it until payment succeeds. A short expiry returns abandoned holds to the pool. Direct booking on click would either give seats to people who never pay or block the seat forever on a failed payment.

QA user pays, their network drops, and they retry. How do you avoid double-charging?+

Idempotency keys. Each attempt carries a unique key; the payment is recorded against it, and a retry with the same key returns the original result instead of charging again. Combined with a booking state machine, the operation lands in exactly one valid state no matter how many times it's retried.

QPayment succeeded but confirmation failed. The user is charged with no ticket. Now what?+

You reconcile. Either a durable workflow retries confirm-after-pay until the booking is confirmed or the payment is refunded, or a reconciliation job continuously matches successful payments against confirmed bookings and auto-refunds orphans. This case is guaranteed to happen at scale, so it must be handled by design, not hope.

QMillions of requests hit at 10:00:00. How do you keep the database alive?+

A virtual waiting room in front. Admit users into the booking core at a sustainable rate and queue the rest with a visible position. This caps the load on the hot inventory rows so the system stays inside its safe range, keeps failure honest, and preserves rough first-come fairness. Also cache 'sold out' so doomed requests never reach the core.

QPessimistic vs optimistic locking for seat inventory?+

Pessimistic (SELECT ... FOR UPDATE) locks the row up front; reliable but serialises everyone and piles up waiters under a rush. Optimistic (a conditional atomic UPDATE) holds only a brief per-statement lock and fails losers immediately, giving much higher throughput. For high-contention inventory, the atomic conditional update is usually the better fit.

QHow do you stop one popular train from slowing bookings for every other train?+

Shard inventory by train and date so independent trains live on independent databases. The contention within one hot train is irreducible (everyone wants the same seats), but sharding keeps that hot spot from dragging down unrelated bookings. Also keep the hot transaction tiny so locks are held briefly.