Data Model Families

Key-value, document, wide-column, graph, and time-series — the shape of each model and the access patterns it was built to serve.

9 min read Level 3/5 #system-design#databases#nosql

What you'll learn

Identify the five core NoSQL data models by shape
Match an access pattern to the model built for it
Recognize where each model breaks down

Last lesson lumped everything that isn’t relational under “NoSQL.” That’s too coarse to design with. There are really five distinct data models, and each one is a different answer to the question “what shape is your data, and how will you read it?” Pick the model that matches your dominant access pattern, and the queries become cheap. Pick the wrong one and you’ll fight the store forever.

Key-value: the dictionary at scale

The simplest model: an opaque value stored under a unique key. It’s a distributed Map. You get(key) and set(key, value) — that’s essentially the whole API. Redis, DynamoDB (in its base mode), and Memcached live here.

Built for: blazing-fast lookups by exact key — sessions, caches, feature flags, counters, rate-limit buckets.
Breaks down when: you need to query by value (“find all sessions for user X”) or scan ranges. The store can’t see inside the value, so there’s nothing to filter on.

Data Model Families — architecture diagram

Document: the self-contained object

A document store keeps structured, queryable JSON-like documents, each one a complete object. MongoDB is the archetype. Unlike key-value, the store understands the document’s fields, so you can index and query on them.

Built for: entities you read and write as a whole — a user profile with nested address and preferences, a product with variants, a blog post with comments embedded.
Breaks down when: data is highly relational and you find yourself joining documents in application code, or when embedded arrays grow unbounded (a post with a million comments doesn’t belong inside one document).

The design rule: embed what you read together, reference what you read apart.

Wide-column: rows of sparse columns

Wide-column stores (Cassandra, HBase, ScyllaDB, Bigtable) look table-ish but aren’t relational. Each row is identified by a partition key and holds a sparse, potentially huge set of columns. The model is engineered so that a query hits exactly one partition — which is what lets it write at enormous throughput across hundreds of nodes.

Built for: write-heavy, known-query workloads at massive scale — chat message history, event logs, IoT readings, time-bucketed feeds.
Breaks down when: your queries are ad-hoc. You must design the table around the query in advance; there are no flexible joins, and adding a new access pattern often means a whole new table holding a copy of the data.

Graph: relationships are the data

A graph database (Neo4j, Neptune) stores nodes and edges as first-class citizens. Where a relational join across “friends of friends of friends” gets exponentially expensive, a graph traverses edges directly in roughly constant cost per hop.

Built for: deeply connected data where the relationships are the question — social graphs, recommendation engines, fraud rings, permission hierarchies, knowledge graphs.
Breaks down when: your data isn’t actually graphy. Most apps have a few foreign keys, not a traversal problem; a graph DB there is overkill.

Time-series: append-only, timestamped

A time-series database (InfluxDB, TimescaleDB, Prometheus) is specialized for timestamped points that mostly arrive in order and are rarely updated. It optimizes hard for high-rate appends, time-windowed queries, and downsampling old data.

Built for: metrics, sensor data, financial ticks, application telemetry.
Breaks down when: you need random updates to old points or relational queries — it’s tuned for the append-and-aggregate shape, not for mutation.

Side by side

Model	Read by	Killer use case	Weak at
Key-value	exact key	cache, session, counter	querying by value
Document	key or indexed field	user profile, product, CMS	many-to-many joins
Wide-column	partition key + range	chat history, event logs	ad-hoc queries
Graph	traversal	social graph, recommendations	non-relational data
Time-series	time window	metrics, telemetry	updates to old data

The JavaScript angle

Because JavaScript objects are already documents, the document model maps onto Node with almost zero friction — what you hold in memory is what you store. But that frictionlessness is a trap: it’s easy to embed an unbounded array because the language doesn’t push back.

Embed what you read together; reference what grows script.js

// ✅ Embed: address + prefs are read with the user, bounded in size.
const user = {
  _id: 'u_123',
  name: 'Ada',
  address: { city: 'London', zip: 'EC1' },
  preferences: { theme: 'dark', locale: 'en-GB' },
};

// ❌ Embed-everything: comments grow without bound — the document balloons,
//    and every profile read drags megabytes of comments along with it.
const post = {
  _id: 'p_9',
  title: 'Hello',
  comments: [ /* ...could be 1,000,000 entries... */ ],
};

// ✅ Reference instead: comments live in their own collection, keyed by post.
const post2 = { _id: 'p_9', title: 'Hello' };
// comments: { _id, postId: 'p_9', body, ... }  ← queried by postId, paginated

▶ Preview: console

The model you choose is a bet on your access pattern. Once that pattern is set, the next question is how the store finds a row quickly — which brings us to indexes.