Why We Put Work in a Queue

A user clicks "Sign up." Your server, in the same request, creates their account, resizes the photo they uploaded, sends a welcome email, and pings an analytics service. Only then does it respond. The user stares at a spinner for eight seconds, because they're waiting for four things when they only cared about one: their account existing.

Worse, if the email provider is having a slow day, the signup is slow — even though emailing has nothing to do with whether the account got created. And if the email service is down, the request errors out, and the user can't sign up at all. Unrelated work has been chained to the one thing that mattered, so a problem in any link breaks the whole chain.

This is synchronous work: the user waits for every step, in order, before getting an answer. The fix is to notice that most of those steps don't need to happen before you answer. They just need to happen eventually.

The idea: hand the slow work to a queue

Split the work in two. Do the essential part now (create the account, respond "you're in"). For everything else, write a little note that says "resize this photo" or "send this email" and drop it into a queue — a list of pending jobs that some other process will pick up and run.

Web server: create account, reply now

Queue: "send welcome email"

Worker: sends it later

The web server does the essential bit and drops a note in the queue. A separate worker picks it up later. The user never waits for it.

The piece that drops notes in is the producer. The queue holds them. A separate process, the worker (or consumer), pulls notes off and does the actual work. The user's request returns the instant the account exists; the email goes out a second later, handled by the worker, and the user never knew or cared.

The deeper thing the queue gives you is decoupling. The producer doesn't know or care who reads the note, how many workers there are, or whether they're even running right now. It drops the note and moves on. The email service could be down for an hour and the producer wouldn't notice. That separation in both who and when is what everything below is built on.

This split buys you several things at once:

Fast responses
The user waits only for the essential work, not the slow extras.
Resilience
If the email service is down, the note just sits in the queue and the worker retries later. The signup still succeeded.
Smoothing out spikes
If a million people sign up at once, a million notes pile up in the queue and the workers chew through them at a steady pace. The queue absorbs the burst instead of crushing the email service.
Independent scaling
Photo resizing is slow and CPU-heavy? Run twenty resize workers without touching your web servers.

Going from synchronous to asynchronous ("do it eventually, not right now") is one of the highest-leverage moves in backend design. And, like every move in this series, it hands you a fresh set of problems in exchange.

The new rules of living asynchronously

Once work leaves the tidy world of a single request and goes into a queue, three things stop being guaranteed for free.

Rule 1: jobs run at-least-once, so plan for duplicates

You'd love each note to be processed exactly once. In practice that's almost impossible to guarantee cheaply across a network. Here's the trap: a worker pulls a note, sends the email successfully, and then crashes before it can tell the queue "done." The queue, having never heard "done," assumes the job failed and hands it to another worker. The email gets sent twice.

So real queues promise at-least-once delivery: every job runs at least once, and occasionally more than once. You don't fight this; you design for it.

This is why workers must be idempotent

Idempotent means: running the job twice has the same effect as running it once. "Charge the customer ₹500" is dangerous to retry — do it twice and you've taken ₹1000. "Set the order's status to paid" is safe — set it twice and it's still just paid. You make jobs idempotent by giving each one an ID and recording which IDs you've already handled, so a repeat is recognised and skipped. Any work that goes through a queue has to survive being run more than once, because sooner or later it will be.

Rule 2: order isn't guaranteed, and guaranteeing it is expensive

If you have one queue and several workers pulling from it in parallel, the note dropped first isn't necessarily finished first — a fast worker might finish a later note before a slow worker finishes an earlier one. For independent jobs (two unrelated welcome emails) nobody cares. But sometimes order matters: "create the user" must happen before "send the user their first notification."

DecisionUse many parallel workers for throughput; accept that strict ordering costs you that parallelism.

Processing jobs in parallel is how a queue chews through a backlog fast. But strict ordering means jobs can't overtake each other, which usually means processing related jobs one at a time, in sequence — giving up the parallelism. The common compromise is ordering only within a group: all jobs for the same user go to the same partition and stay ordered relative to each other, while different users' jobs still run in parallel. You pay the ordering cost only where it's actually needed.

When order doesn't matter, don't ask for it — it's expensive. When it does, scope it as narrowly as you can.

Rule 3: a job can fail forever, so catch the poison

A note might describe work that can never succeed — a malformed photo, an email to an address that doesn't exist. A naive queue retries it, fails, retries, fails, forever, blocking everything behind it. This is a poison message.

The standard handling: give each job a retry limit. After, say, five failures, move it out of the main queue into a dead-letter queue — a side list of jobs that gave up. Now the bad job is out of the way, the rest of the work flows, and a human (or an alert) can look at the dead-letter queue later to see what's broken. Failures get quarantined instead of clogging the pipe.

Don't reach for it too early

A queue adds real moving parts: the queue service itself, worker processes, monitoring for backlogs, dead-letter handling. For a small app, doing the work inline in the request is simpler and perfectly fine.

When a queue earns its keep

Introduce a queue when work is genuinely slow (image processing, video transcoding, calling a sluggish third party), when it can be retried safely later (emails, notifications, syncing to another system), or when bursts would otherwise overwhelm something downstream. If the work is fast and the user needs its result in the response, keep it synchronous. The queue is for "must happen, but not right now."

Where this leaves you

The arc, in the order the pain forces it:

A request does slow, unrelated work before responding
The user waits for things they don't care about, and a failure in any of them breaks the whole request.
Move the non-essential work to a queue
Do the essential bit, drop a note, respond now. A worker handles the rest later. Faster, more resilient, absorbs spikes.
Jobs can now run more than once
Delivery is at-least-once, so make every worker idempotent — running a job twice must equal running it once.
Order and permanent failures need handling
Demand ordering only where it's truly needed (and only within a group), and send repeatedly-failing jobs to a dead-letter queue so they don't block the rest.

The one idea to take away

A queue lets you answer the user now and do the rest soon, which is the difference between a snappy, sturdy app and a slow, brittle one. The price of "soon" is that you've left the safe single-request world: jobs can repeat, arrive out of order, and fail forever. Design for all three — idempotent workers, ordering only where needed, a dead-letter queue for the hopeless — and async work becomes a superpower instead of a source of mystery bugs.

Test yourself

Questions· say the answer out loud before you open it. If you can't, the chapter isn't done.

QWhat problem does moving work into a queue actually solve?+

It stops the user from waiting on slow work they don't care about. Instead of doing everything inside the request (creating the account, resizing the photo, sending the email) before responding, the server does only the essential part, drops the rest as notes in a queue, and responds immediately. A separate worker handles the slow work later. This makes responses fast, keeps a failing dependency from breaking the main action, and lets a burst pile up in the queue instead of overwhelming a downstream service.

QWhat does 'at-least-once' delivery mean, and why can't queues just promise exactly-once?+

At-least-once means every job is guaranteed to run, but might occasionally run more than once. Exactly-once is extremely hard to guarantee cheaply because a worker can do the work and then crash before acknowledging it; the queue, never hearing 'done', re-delivers the job to another worker. Rather than pay the high cost of true exactly-once, systems accept at-least-once and make the work safe to repeat.

QWhy must queue workers be idempotent? Give a safe and an unsafe example.+

Because at-least-once delivery means a job can run twice, and the second run must not cause harm. 'Charge the customer ₹500' is unsafe — running it twice charges ₹1000. 'Set the order status to paid' is safe — running it twice still leaves it paid. You make non-idempotent work safe by tagging each job with an ID and recording handled IDs, so a duplicate is recognised and skipped.

QYour queue has many parallel workers but one feature needs jobs processed in order. How do you handle it?+

Don't force the whole queue to be ordered — that throws away the parallelism that makes it fast. Instead scope ordering to the group that needs it: route all jobs for the same user (or order, or entity) to the same partition so they stay in sequence relative to each other, while jobs for different users still run in parallel. You pay the ordering cost only where it's actually required.

QWhat is a poison message and what is a dead-letter queue for?+

A poison message is a job that can never succeed (a corrupt file, an invalid address). Without handling, a queue retries it forever, blocking everything behind it. The fix is a retry limit: after a few failures the job is moved to a dead-letter queue — a side list of jobs that gave up — so the rest of the work keeps flowing and someone can inspect the failures later instead of letting them clog the pipe.