This is the feature that makes people say "I didn't know SQL could do that." Window functions answer questions that GROUP BY can't, because they compute a result across a set of rows while still returning every individual row. Rankings, running totals, "compared to last month," per-group top-N: all window functions.
The problem GROUP BY can't solve
Say you want each order listed with a running total of revenue over time. GROUP BY is no help: grouping by date would collapse your orders into one row per date, but you want every order to stay visible, each annotated with the total so far. You need to keep the rows and add a computed column that looks across them. That's a window function.
The shape: function() OVER (...)
Every window function is a function followed by OVER (...). The OVER clause defines the "window": which rows this calculation looks at, and in what order. Let's start with ranking. Here we rank customers by total spend, highest first.
Press Run to execute your query.
RANK() OVER (ORDER BY s.total DESC) says: order the rows by total descending, and assign each a rank. Crucially, every customer still appears as their own row, now with a spend_rank column. (A plain GROUP BY would have summarised them away; the window keeps them.)
ROW_NUMBER vs RANK vs DENSE_RANK
Three similar ranking functions, differing only on ties. ROW_NUMBER() gives every row a distinct number (1, 2, 3, 4) even on ties. RANK() gives ties the same number, then skips (1, 2, 2, 4). DENSE_RANK() gives ties the same number without skipping (1, 2, 2, 3). Pick based on whether ties should share a rank and whether you want gaps after them.
Running totals: an aggregate as a window
Put an aggregate like SUM() in front of OVER (ORDER BY ...) and it becomes a running total: each row gets the sum of itself plus all earlier rows in the window's order.
Press Run to execute your query.
Every order still appears, and running_total climbs as you go down by date. This is the difference that trips people up: SUM(total) with GROUP BY gives one total; SUM(total) OVER (ORDER BY ...) gives a running total on every row. Same function, completely different behaviour, decided by OVER.
PARTITION BY: a separate window per group
PARTITION BY splits the rows into groups and restarts the window calculation within each. It's like GROUP BY for the window, but again without collapsing rows. Here we rank products by price within each category, so each category gets its own #1.
Press Run to execute your query.
ROW_NUMBER() OVER (PARTITION BY category ORDER BY price DESC) numbers products 1, 2, 3… restarting at 1 for each category. This is the standard way to get "top N per group" (wrap it in a CTE and filter WHERE price_rank <= 3).
LAG and LEAD: reach to other rows
LAG() looks at a previous row and LEAD() at a following one, in the window's order. This is how you compute change over time without joining the table to itself.
Press Run to execute your query.
Each order now shows the previous order's total and the difference. The first row's prev_total is empty (∅) because there's no earlier row to look back to. LAG/LEAD turn awkward self-joins into one readable column, perfect for "vs the previous period" calculations.
The mental model that makes windows click
GROUP BY answers "give me one summary row per group." A window function answers "keep every row, and add a column computed by looking across a set of related rows." The OVER (...) clause defines that set: PARTITION BY chooses which rows belong together, ORDER BY orders them (which is what makes running totals, rankings, and LAG/LEAD meaningful). Once you see "keep the rows, add a cross-row column," every window function fits the same pattern.
That completes the advanced SQL module. You can now express ranking, running totals, top-N-per-group, and period-over-period change, the queries that separate comfortable SQL users from beginners. From here, the database profiles cover the popular databases, and the data-layer deep dives take Postgres internals all the way down.
Test yourself
Questions· say the answer out loud before you open it. If you can't, the chapter isn't done.
QWhat can a window function do that GROUP BY can't?+
Compute a value across a set of related rows while keeping every individual row, instead of collapsing them into one summary row per group. That's what rankings, running totals, and row-to-row comparisons need: each row stays visible but gains a column computed by looking across its neighbours.
QWhat does the OVER clause define?+
The window: which rows the function looks at and in what order. PARTITION BY chooses which rows are grouped together for the calculation, and ORDER BY orders them within each partition, which is what makes running totals, rankings, and LAG/LEAD meaningful.
QROW_NUMBER vs RANK vs DENSE_RANK?+
They differ on ties. ROW_NUMBER always gives distinct numbers (1,2,3,4). RANK gives ties the same number then skips (1,2,2,4). DENSE_RANK gives ties the same number without skipping (1,2,2,3). Choose based on whether ties should share a rank and whether you want a gap after them.
QHow do you get a running total with a window function?+
Use an aggregate with OVER and an ORDER BY, like SUM(total) OVER (ORDER BY ordered_at). Each row then gets the sum of itself plus all earlier rows in that order. Without OVER (i.e. with GROUP BY) you'd get a single total; with OVER you get a running one on every row.
QWhat are LAG and LEAD used for?+
LAG looks at a previous row and LEAD at a following row, in the window's order, so you can compute change over time (this row minus the previous one) without joining the table to itself. The first row's LAG is null because there's no earlier row to reference.
Comments
Loading comments…