Module 02Databasesmedium16 min

Module 2 — The Data Layer

The schema is the most expensive thing to get wrong, because everything else is built on its shape. Spend the time here: model it properly, constrain it at the database, and reach the database through a pool, not a fresh connection per request.

Your skeleton runs but forgets everything the moment a request ends. This module gives it a memory: a Postgres database, a real schema, and pooled access. We lean on The Data Layer chapter and the Postgres storage deep dive — and we put the indexes in now, because retrofitting them under load is how slow-query incidents start.

Goal

  • A users and a snippets table with sensible keys, types, and constraints.
  • The schema applied as a migration (a versioned, repeatable change), not hand-typed into a console.
  • A pooled Postgres client the app uses for every query.
  • A repository module with the first queries (create, get-by-public-id), each backed by an index.

Step 1: Run Postgres

Locally, the easiest real Postgres is a container:

docker run --name snippets-db -e POSTGRES_PASSWORD=dev \
  -e POSTGRES_DB=snippets -p 5432:5432 -d postgres:16

Set DATABASE_URL=postgres://postgres:dev@localhost:5432/snippets in a .env file (and load it in dev). Your Module 1 config validation now has something to validate.

Step 2: Model the schema

Two tables. The decisions here matter more than the syntax:

create table users (
  id          bigint generated always as identity primary key,
  email       text not null unique,
  password_hash text not null,
  created_at  timestamptz not null default now()
);

create table snippets (
  id          bigint generated always as identity primary key,
  public_id   text not null unique,          -- the short URL id, e.g. "aB3x9"
  owner_id    bigint not null references users(id),
  title       text,
  body        text not null,
  language    text not null default 'text',
  is_public   boolean not null default false,
  view_count  bigint not null default 0,
  created_at  timestamptz not null default now()
);

Why two IDs (id and public_id)

The integer id is the internal primary key — compact, fast to index and join on, and never exposed. The public_id is the random short string in the URL. Keeping them separate means your URLs don't leak how many snippets exist (an auto-increment in the URL tells everyone you're on snippet #4,012) and aren't enumerable by an attacker incrementing a number — which ties straight back to the IDOR discussion in the authorization deep dive.

Notice what the database is enforcing for you: email is unique (no duplicate accounts, guaranteed even under a race), owner_id references users(id) (you can't orphan a snippet), not null where a value is mandatory. These are constraints you never have to re-check in application code, because the database refuses the bad write.

Step 3: Make it a migration

A migration is a versioned SQL change that runs the same way in dev, CI, and production. Use a migration tool (node-pg-migrate, Drizzle, Prisma Migrate — pick one) so the schema's history is in version control and reproducible.

  1. Write the change as an up migration

    The create table statements above go in a migration file, numbered/timestamped so order is deterministic.

  2. Apply it with a command, never by hand

    npm run migrate up. The tool records which migrations have run in a tracking table, so running it again is a no-op — it's idempotent.

  3. Commit the migration file

    Now CI and production build the exact same schema you have locally, from the same files.

Why never hand-edit production schema

A schema change typed directly into a production console exists nowhere in version control, can't be reproduced in CI, and will silently differ from what the next developer's local database has — until a query that works locally fails in production for "no reason." Migrations make the schema a build artifact: same input, same output, everywhere. The testing deep dive's migration-disaster story is exactly what skipping this causes.

Step 4: Connect through a pool

Opening a fresh database connection per request is slow (a TCP + auth handshake every time) and dangerous (Postgres has a hard connection limit; a traffic spike opens thousands and the database falls over). A connection pool keeps a small set of connections open and lends them out, exactly as the chapter describes.

// src/db.ts
import { Pool } from "pg";
import { config } from "./config.js";

export const pool = new Pool({
  connectionString: config.databaseUrl,
  max: 10, // tune to your DB's limit ÷ number of app instances
});

export async function shutdownDb() {
  await pool.end();
}

Call shutdownDb() from your Module 1 graceful-shutdown drain, before process.exit — so a deploy closes connections cleanly instead of leaving them dangling on the database.

Pool sizing is a system property, not a per-app guess

If your database allows 100 connections and you run 20 app instances each with a pool max of 10, that's 200 connections fighting over 100 — and you'll see "too many clients" errors under load. The pool size times the number of instances must fit under the database's limit. This is the kind of cross-cutting number that only a full-stack engineer watching the whole system catches.

Step 5: First queries, each on an index

// src/repos/snippets.ts
import { pool } from "../db.js";

export async function insertSnippet(s: {
  publicId: string; ownerId: number; title: string | null;
  body: string; language: string; isPublic: boolean;
}) {
  const { rows } = await pool.query(
    `insert into snippets (public_id, owner_id, title, body, language, is_public)
     values ($1,$2,$3,$4,$5,$6) returning *`,
    [s.publicId, s.ownerId, s.title, s.body, s.language, s.isPublic]
  );
  return rows[0];
}

export async function getByPublicId(publicId: string) {
  const { rows } = await pool.query(
    `select * from snippets where public_id = $1`, [publicId]
  );
  return rows[0] ?? null;
}

getByPublicId filters on public_id, and we declared that unique, which gives it an index for free — so this lookup is an index hit, not a full table scan, even at a million rows. We'll add the index for listing a user's snippets in Module 3 when we build that query, because the right index depends on exactly how you query.

Always parameterise — never string-concatenate SQL

Every value above goes through $1, $2, … placeholders, never string interpolation into the query text. This is the single defense against SQL injection: the database treats parameters as data, never as SQL to execute. A query built with `... where public_id = '${input}'` is a vulnerability, full stop.

Acceptance check

Write a tiny throwaway script (or a psql session) that inserts a snippet and reads it back by public_id:

npm run migrate up      # schema applies cleanly
# then run a script that calls insertSnippet(...) and getByPublicId(...)
# → you get the same row back, with an auto-assigned id and created_at.

You're done when migrations apply from scratch, an insert returns a row with a generated id, and getByPublicId reads it back through the pool. Commit it.

What you just internalised

The data layer is where senior shows: a well-keyed schema with the database enforcing your invariants, changes shipped as reproducible migrations, and access through a sized pool that fits the whole system's connection budget. Get this right and the API layer above it is almost boring — which is exactly what you want.

Before you leave — how confident are you with this?

Your honest rating shapes when you'll see this again. No grades, no shame.

Comments

to join the discussion.

Loading comments…