Databases, in depthDatabaseshard18 min read

Transactions, Isolation, and the Bugs They Prevent

Isolation levels aren't trivia. Each one is defined by a specific concurrency bug it prevents, and choosing the wrong level is how correct-looking code silently corrupts data under load.

Everyone can recite ACID. Far fewer can say what actually goes wrong when two transactions touch the same data at the same time, which is the only reason isolation levels exist. This deep dive is about the concrete bugs, the anomalies, and which isolation level stops which one. Get this wrong and you ship code that passes every test and corrupts data the first busy afternoon.

What a transaction promises (ACID, briefly)

A transaction groups statements so they all succeed or all fail together.

BEGIN;
  UPDATE accounts SET balance = balance - 100 WHERE id = 'A';
  UPDATE accounts SET balance = balance + 100 WHERE id = 'B';
COMMIT; -- or ROLLBACK

The guarantees: Atomicity (all or nothing), Consistency (constraints hold across the change), Isolation (concurrent transactions don't see each other's half-done work), and Durability (once committed, it survives a crash). The first and last are relatively simple. Isolation is where the depth is, because perfect isolation would mean no two transactions ever run at once, which would be unbearably slow. So databases offer levels, each trading a bit of isolation for concurrency.

The anomalies isolation levels prevent

Isolation levels are best understood backwards: by the bug each one rules out.

  1. Dirty read

    You read data another transaction wrote but hasn't committed. If it rolls back, you acted on data that never existed.

  2. Non-repeatable read

    You read a row twice in one transaction and get different values, because another transaction committed a change in between.

  3. Phantom read

    You run the same query twice and the second time new rows appear, because another transaction inserted rows matching your filter.

  4. Lost update

    Two transactions read the same value, both modify it, and one write silently overwrites the other. The classic balance-corruption bug.

  5. Write skew

    Two transactions each read a shared state, each individually make a valid decision, but together they violate an invariant neither could see the other breaking.

The levels, and what each stops

SQL defines four isolation levels. Higher levels prevent more anomalies and allow less concurrency. Read top to bottom as "each level also stops everything above it, plus one more thing."

Read Uncommittedstops nothing extra · Postgres treats it as Read Committed
Read Committed (Postgres default)stops dirty reads
Repeatable Readalso stops non-repeatable reads + phantoms
Serializablealso stops write skew · behaves as if one-at-a-time
Each level stops every anomaly the ones above it stop, and then one more.

Two Postgres specifics worth noting: it never actually allows dirty reads (Read Uncommitted behaves like Read Committed), and its Repeatable Read uses a true snapshot that also prevents phantom reads, going beyond the strict SQL definition.

Read Committed (the default) means each statement sees data committed at the moment that statement began. Two reads of the same row in one transaction can return different values. Fine for most web requests, where each request is short.

Repeatable Read gives the whole transaction one frozen snapshot taken at its start. Every read sees the same data, which is what you want for a report or any multi-step calculation that must be internally consistent.

Serializable makes transactions behave as if they ran one at a time, end to end. It's the only level that prevents write skew. Postgres implements it by watching for dangerous read/write patterns and aborting a transaction that would have produced a non-serializable result, so your code must be ready to retry.

How MVCC makes this cheap

From the storage deep dive: Postgres keeps multiple versions of each row (MVCC). Isolation levels are built on snapshots over those versions. Read Committed takes a fresh snapshot per statement; Repeatable Read and Serializable take one snapshot per transaction. Because readers look at a snapshot of committed versions, readers never block writers and writers never block readers for ordinary reads. The cost you pay for this elsewhere is the dead tuples and vacuuming covered in the storage deep dive.

The lost update, and how to actually prevent it

The lost update deserves special attention because application code causes it constantly. The pattern is innocent: read a value, change it in your code, write it back. Between the read and the write, another request does the same thing, and one update vanishes.

Watch an update get lost, then fix it atomicallyrun · edit · saved to you
Loading editor…

There are three correct fixes, in rough order of preference:

Do it in one atomic statement

Don't read-modify-write in app code. Let the database compute: UPDATE accounts SET balance = balance + 100 WHERE id = ?. The row lock during the statement serialises concurrent updates, so nothing is lost. Simplest and fastest when the new value is a function of the old.

Optimistic locking

Read a version number with the row, and write only if it hasn't changed: UPDATE ... SET ..., version = version + 1 WHERE id = ? AND version = ?. If it updates zero rows, someone beat you, so you retry. Great when conflicts are rare and the update is complex.

The third is pessimistic locking: SELECT ... FOR UPDATE grabs the row's lock up front so other writers wait until you commit. Reliable, but it serialises everyone through the locked rows and can pile up waiters, so reach for it when conflicts are frequent and the work between read and write is unavoidable.

Deadlocks

A deadlock is two transactions each holding a lock the other needs: T1 locked row A and wants B, T2 locked B and wants A. Neither can proceed. Postgres detects the cycle and kills one transaction with a deadlock error.

The fix is almost always consistent lock ordering: if every code path acquires locks in the same order (say, rows sorted by ID ascending), a cycle can't form. When you transfer between two accounts, lock them in ID order regardless of which is sender and which is receiver. Beyond that, keep transactions short (less time holding locks) and be ready to retry the loser of a deadlock, since a retry usually succeeds once the other transaction is gone.

Serializable and optimistic locking both require retry logic

Both Serializable isolation and optimistic locking work by detecting a conflict and failing one transaction, rather than blocking up front. That means your application must catch the serialization-failure or zero-rows-updated case and retry the transaction. Code written assuming "it committed, therefore it succeeded" will silently drop work under these schemes. The retry loop isn't optional; it's part of the design.

The one idea to take away

Isolation levels are defined by the concurrency bug each one prevents: dirty read, non-repeatable read, phantom, lost update, write skew. Read Committed (the default) is right for short web requests; step up to Repeatable Read for multi-step reads that must be consistent, and Serializable when you need protection from write skew. Prevent lost updates with one atomic statement rather than read-modify-write in app code, order your locks consistently to avoid deadlocks, and always retry the transactions that conflict-detection schemes abort.

Test yourself

Questions· say the answer out loud before you open it. If you can't, the chapter isn't done.

QWhat does the Isolation in ACID actually protect against, and why is it the hard part?+

It stops concurrent transactions from seeing each other's half-finished work and from corrupting shared data. It's the hard part because perfect isolation would forbid any concurrency (unbearably slow), so databases offer levels that each trade some isolation for throughput, and choosing wrong silently corrupts data under load.

QName the anomalies and pair each with the level that prevents it.+

Dirty read (prevented by Read Committed and above), non-repeatable read and phantom (prevented by Repeatable Read in Postgres), lost update (prevented by atomic writes / optimistic locking, or Serializable), and write skew (only prevented by Serializable). Each level is essentially defined by the set of anomalies it rules out.

QRead Committed vs Repeatable Read, concretely?+

Read Committed (Postgres default) takes a fresh snapshot per statement, so two reads of the same row in one transaction can differ. Repeatable Read takes one snapshot at transaction start, so every read sees the same data. Use Repeatable Read for reports or multi-step calculations that must be internally consistent.

QWhat is write skew and which level prevents it?+

Two transactions each read shared state, each make an individually valid decision, but together they break an invariant neither could see the other breaking (e.g. both doctors go off-call because each saw the other on-call). Only Serializable prevents it, because lower levels don't detect the hidden read/write conflict between the two transactions.

QYour code reads a balance, adds to it, and writes it back. Two requests run at once. What breaks and how do you fix it?+

A lost update: both read the same balance, both write based on the stale value, and one update is overwritten. Fix it by doing the change in one atomic statement (UPDATE ... SET balance = balance + ?), or with optimistic locking (a version check that retries on conflict), or pessimistic SELECT ... FOR UPDATE. Never read-modify-write across a gap in app code.

QHow do you prevent deadlocks?+

Acquire locks in a consistent order across all code paths (e.g. rows sorted by ID), so a lock cycle can't form. Also keep transactions short to hold locks briefly, and retry the transaction Postgres aborts when a deadlock is detected, since a retry usually succeeds once the other transaction has finished.

QWhy do Serializable isolation and optimistic locking both need retry logic?+

Because both work by detecting a conflict and aborting one transaction rather than blocking up front. Serializable raises a serialization failure; optimistic locking updates zero rows when the version changed. Your app must catch that and retry, or it silently drops the conflicting work. The retry loop is part of the design, not an afterthought.

Before you leave — how confident are you with this?

Your honest rating shapes when you'll see this again. No grades, no shame.

More deep dives

Comments

to join the discussion.

Loading comments…