When people say "database," they could mean any of a dozen quite different things. The good news: they cluster into a handful of families, and once you know what each family is shaped for, choosing becomes easy. Let's walk the menu.
The big split: relational vs the rest
Historically there were relational (SQL) databases, and then everything else got lumped under "NoSQL." That label is unhelpful (it just means "not relational"), so we'll talk about the actual shapes instead. But the first fork in the road is real:
Relational (SQL)
Data lives in tables of rows and columns, with a fixed shape (a schema), and tables link to each other by keys. You query with SQL. Strong on consistency, relationships, and complex queries. The default for most apps, and right far more often than people assume.
Non-relational (NoSQL)
A family of databases that drop one or more relational rules to gain something else: flexible shape, raw speed, huge scale, or a data model that fits a specific job (documents, graphs, key-values). Each one trades away generality for being excellent at one thing. The trade is often about consistency: relational databases give you ACID guarantees (a write is durable and everyone sees it immediately), while many distributed NoSQL systems offer "eventual consistency" instead, meaning a write may take a moment to show up everywhere in exchange for staying fast and available at scale.
The families, one by one
1. Relational — the workhorse
Tables, rows, columns, schema, SQL. You define that a customer has an id, name, and city; an order belongs to a customer. The database enforces that shape and lets you ask rich questions across linked tables (joins). Best when your data has clear structure and relationships, and when correctness matters (anything with money, inventory, or accounts).
Popular names: PostgreSQL, MySQL / MariaDB, SQLite, Microsoft SQL Server, Oracle.
2. Document — flexible blobs of JSON
Instead of rigid tables, you store documents: JSON-like objects that can nest and can differ from each other. No fixed schema, so each record can have different fields. Great when your data is naturally a self-contained object (a product with wildly varying attributes, a CMS page) and you mostly fetch one whole document at a time.
Popular names: MongoDB, Couchbase, Firestore, DynamoDB (also key-value).
3. Key-value — the dictionary
The simplest model: a key maps to a value, like a giant hash map. Blisteringly fast lookups by key, and not much else. Perfect for caching, sessions, rate-limit counters, and anything you fetch by a single known key. Usually in-memory, so sub-millisecond.
Popular names: Redis, Memcached, DynamoDB, etcd.
4. Wide-column — tables for enormous scale
Looks table-ish but is built to spread across hundreds of machines and swallow huge write volumes, trading away some query flexibility and strict consistency. Reach for it when you have truly massive, write-heavy data (sensor streams, event logs at internet scale) and a single relational box can't keep up.
Popular names: Cassandra, ScyllaDB, HBase, Google Bigtable.
5. Graph — relationships are the point
When the connections between things matter more than the things, a graph database stores nodes and the edges between them, and makes "friends of friends of friends" or "shortest path" queries fast. Great for social networks, recommendations, and fraud rings, where relational joins would get painful.
Popular names: Neo4j, Amazon Neptune, Dgraph.
6. Search — find text, ranked by relevance
Built to take a pile of text and answer "which documents best match these words," with ranking, typo tolerance, and faceting. It's what powers search bars and log exploration. You usually run it alongside your main database, not instead of it.
Popular names: Elasticsearch, OpenSearch, Meilisearch, Typesense.
7. Time-series — data stamped with time
Optimised for data that's mostly "a measurement at a moment": metrics, prices, IoT readings. Hugely efficient at appending and at "average per minute over the last day" style queries. If your dominant axis is time, this family earns its place.
Popular names: TimescaleDB, InfluxDB, Prometheus, ClickHouse (also analytics).
8. Vector — search by meaning
The newest family, powering AI features. It stores embeddings (lists of numbers representing meaning) and finds the ones most similar to a query, which is how "find documents about this idea" works in RAG. Often this is just an extension on a database you already run.
Popular names: pgvector (on Postgres), Pinecone, Weaviate, Qdrant, Milvus.
A quick map of when to use what
The advice that saves most teams pain
Start relational (Postgres), and add a second database only when a real need appears. Postgres alone handles structured data, JSON documents, full-text search, and vector search, so you can go a remarkably long way with one boring, reliable system. Reach for Redis when you need a cache, and for the specialised families only when their specific job (massive scale, graph traversal, ranked search) actually shows up. Most "we need MongoDB" moments are really "we haven't learned Postgres yet."
Why we'll focus on SQL and Postgres
For the rest of this course we'll teach SQL against PostgreSQL, for three reasons. It's the most common shape you'll meet, the skills transfer to MySQL and SQLite almost unchanged, and we can run real Postgres right in your browser so you actually do it instead of just reading. Later lessons profile the other popular databases so you'll recognise each and know when to reach for it.
Next: your first real queries, executed live.
Test yourself
Questions· say the answer out loud before you open it. If you can't, the chapter isn't done.
QWhat does 'relational' actually mean?+
Data lives in tables of rows and columns with a defined shape (schema), and tables link to each other by keys, so you can query across them with SQL. It's strong on structure, relationships, and correctness, which is why it's the default for most application data.
QWhen would you choose a document database over relational?+
When your data is naturally a self-contained object with a flexible or varying shape, and you mostly read or write one whole document at a time, so a fixed schema and joins would just get in the way. A product catalog with wildly different attributes per item is a classic fit.
QWhat is a key-value store good for?+
Fast lookups by a single known key: caching, user sessions, rate-limit counters. It's essentially a giant hash map, usually in memory, so it's sub-millisecond but can't answer rich queries. Redis is the popular example.
QWhat problem does a vector database solve?+
Searching by meaning rather than exact words. It stores embeddings (numeric representations of meaning) and finds the closest ones to a query, which powers AI retrieval (RAG). Often it's just an extension like pgvector on a database you already run.
QWhat's the safe default, and why?+
Start with a relational database, specifically Postgres, and add another only when a concrete need appears. Postgres handles structured data, JSON, full-text search, and vectors, so one reliable system goes a long way. Add Redis for caching and specialised databases only when their specific strength is actually required.
Comments
Loading comments…