An Index Trades Write Cost and Space for Read Speed
Indexing
What an index is and what it costs, B-tree vs LSM-tree, composite and covering indexes, and when a full table scan is actually fine.
What you'll learn
- Explain what an index is and the write/space cost it adds
- Contrast B-tree and LSM-tree storage engines
- Design composite and covering indexes for a query
An index is a separate data structure the database maintains so it can find
rows without scanning the whole table. It’s the book’s index versus reading
every page. Without one, finding a user by email means reading every row — an
O(n) scan. With one, it’s a handful of pointer hops — roughly O(log n).
That speedup isn’t free, and understanding the bill is what separates “add an index” as a reflex from adding the right index on purpose.
What an index costs
Every index is a copy of some columns, kept sorted, that the database must update on every write. So:
- Writes get slower. Insert one row, and the DB updates the table and every index on it. Ten indexes means eleven structures touched per insert — this is write amplification.
- Space grows. Each index is extra on-disk data; heavily-indexed tables can have indexes larger than the table itself.
- The optimizer has more to consider. Too many overlapping indexes can even confuse the query planner.
The rule of thumb: index the columns you filter, join, and sort on — and nothing else. An unused index is pure cost with no benefit.
B-tree vs LSM-tree
Two storage-engine families dominate, and the difference explains why some databases are read-optimized and others write-optimized.
A B-tree (Postgres, MySQL/InnoDB) keeps data in a balanced tree of fixed-size pages, updated in place. Lookups are a few page reads — excellent, predictable read latency. The cost is that random in-place writes mean random disk I/O, and writes can be slower under heavy load.
An LSM-tree (log-structured merge-tree — Cassandra, RocksDB, LevelDB, the engine under many wide-column and key-value stores) buffers writes in memory, then flushes them as sorted, immutable files appended to disk. Writes become fast sequential appends (recall from the latency lesson: sequential I/O crushes random I/O). The catch: a read may have to check several files, and a background compaction process must continually merge them.
| B-tree | LSM-tree | |
|---|---|---|
| Writes | in-place, random I/O | sequential appends — fast |
| Reads | predictable, few hops | may scan several files |
| Write amplification | lower per write | higher (compaction) |
| Best for | read-heavy, OLTP | write-heavy ingestion |
| Examples | Postgres, InnoDB | Cassandra, RocksDB |
This is why write-heavy stores reach for LSM-trees: turning random writes into sequential appends is the single biggest throughput lever available on the write path.
Composite and covering indexes
A composite index covers multiple columns in order — and order matters. An
index on (status, created_at) accelerates queries that filter on status
(optionally also ranging on created_at), but does not help a query that
filters on created_at alone. Think of a phone book sorted by last name then
first name: useless for finding everyone named “Ada.”
A covering index includes every column a query needs, so the database answers the query from the index without touching the table at all — an index-only scan. It’s the fastest a query can go short of a cache hit.
-- Hot query: most recent open orders for a user, just id + total.
SELECT id, total
FROM orders
WHERE user_id = $1 AND status = 'open'
ORDER BY created_at DESC
LIMIT 20;
-- Composite + covering: filter cols first, sort col next, payload INCLUDEd.
-- The DB walks straight to this user's open orders, already sorted,
-- and reads id/total right out of the index — zero table lookups.
CREATE INDEX idx_orders_user_open
ON orders (user_id, status, created_at DESC)
INCLUDE (id, total); When a full scan is fine
Indexes aren’t always the answer. A full scan is the right call when:
- The table is small — scanning 500 rows is faster than the index indirection.
- The query touches most of the table anyway (a report over 80% of rows).
- It runs rarely — a nightly batch job doesn’t justify a write-path tax.
The query planner often chooses a scan in these cases even if an index exists,
because it estimates the scan is cheaper. Trust EXPLAIN over your instinct.
The JavaScript angle
Node engineers meet indexing most viscerally through an ORM, where a single innocent line of JS generates a query whose performance lives or dies by whether an index exists:
// Prisma: looks like an in-memory filter, but it's SQL hitting disk.
const orders = await prisma.order.findMany({
where: { userId, status: 'open' }, // ← needs an index on (userId, status)
orderBy: { createdAt: 'desc' }, // ← ...ideally including createdAt
take: 20,
});
// With the composite index: a few milliseconds.
// Without it: a full-table scan that gets slower every day the table grows. The takeaway for the JS dev: the ORM hides the index, not the cost. Read the
queries your code emits, run EXPLAIN ANALYZE, and index the access patterns
that run millions of times a day.
Indexes make a single node fast. The next problem is surviving when that node dies and serving reads from more than one machine — replication.