Scaling Up vs Scaling Out

Vertical scaling (a bigger machine) vs horizontal scaling (more machines): when each wins, the hard ceiling of scaling up, and why scaling out demands statelessness.

7 min read Level 2/5 #system-design#scalability#horizontal

What you'll learn

Distinguish vertical from horizontal scaling and their cost curves
Recognize the hard ceiling and single-point-of-failure of scaling up
Explain why horizontal scaling requires stateless services

When traffic outgrows one server, you have exactly two moves. You can make the server bigger — more CPU, more RAM, faster disks — or you can add more servers and spread the load. The first is vertical scaling (scale up); the second is horizontal scaling (scale out). Almost every scaling decision in this track is some flavor of choosing between, or combining, these two.

Scaling up: one bigger box

Vertical scaling means upgrading the machine you already have. A t3.medium becomes a c6i.16xlarge; 2 vCPUs become 64; 4 GB of RAM becomes 128. Your code doesn’t change at all — the same single process just has more resources to chew through.

This is the path of least resistance, and that’s exactly why it’s seductive:

No architecture change. No load balancer, no distributed state, no rethinking. You change an instance type and restart.
No new failure modes. A single process on a single box has no network partitions, no clock skew, no “which replica is authoritative” questions.
It’s often enough. A surprising number of real services run happily on one fat machine plus a database. Don’t distribute what you don’t have to.

But scaling up has a hard ceiling. There is a biggest machine money can buy, and once you’re on it, you’re out of road. Worse, that one box is a single point of failure — when it reboots, everything is down. And the cost curve is brutal: doubling the cores of an already-large instance often more than doubles the price, because top-end hardware commands a premium.

Scaling out: more boxes

Horizontal scaling means running many identical instances and dividing traffic among them. Instead of one 64-core monster, you run sixteen 4-core machines behind a load balancer. Need more capacity? Add a seventeenth.

This is how every system at real scale works, because it removes the ceiling:

No upper bound. You can keep adding commodity machines essentially forever. Google doesn’t run on one enormous computer.
Redundancy comes for free. As you saw in the availability lesson, two cheap servers in parallel beat one expensive server for uptime. If one instance dies, the others absorb its traffic.
Cheaper per unit of capacity. Commodity hardware has a flatter, friendlier cost curve than top-of-the-line single machines.

The catch — and it’s the whole ballgame — is that horizontal scaling only works if any instance can handle any request. The moment a request needs a specific server (because that server is holding the user’s session, or their WebSocket connection, or some in-memory cache), you can no longer freely spread load. That property is called statelessness, and it’s the price of admission to scaling out. We devote a whole lesson to it shortly.

Scaling Up vs Scaling Out — architecture diagram

The tradeoff, side by side

	Scale up (vertical)	Scale out (horizontal)
How	Bigger machine	More machines
Ceiling	Hard — biggest box exists	Effectively none
Code changes	None	Must be stateless
Failure	Single point of failure	Survives node loss
Cost curve	Steep at the top end	Flatter, commodity
Best for	Early stage, DBs, simplicity	Web/API tiers at scale

The honest answer to “which one?” is both, in order. Scale up first because it’s free engineering-wise — squeeze the single box until it’s genuinely the bottleneck. Then, before you hit the ceiling, do the work to scale out. Many systems keep the database vertically scaled (it’s hard to distribute) while the stateless app tier scales horizontally.

The JavaScript angle: one Node process uses one core

Here’s a wrinkle that makes horizontal thinking matter even on a single box for Node engineers: a Node process runs your JavaScript on one thread, so it saturates one CPU core. Buy a 32-core machine and a naive Node app uses 1/32 of it. Vertical scaling the box does almost nothing until you also run more processes.

The fix is to scale out within the machine first, using the built-in cluster module (or a process manager like PM2) to fork one worker per core:

Use every core: one worker per CPU script.js

import cluster from 'node:cluster';
import { availableParallelism } from 'node:os';
import http from 'node:http';

if (cluster.isPrimary) {
  const cores = availableParallelism();      // e.g. 8 on an 8-core box
  console.log(`Forking ${cores} workers`);
  for (let i = 0; i < cores; i++) cluster.fork();

  cluster.on('exit', (worker) => {
    console.log(`Worker ${worker.process.pid} died — restarting`);
    cluster.fork();                          // keep the pool full
  });
} else {
  // Each worker is a full Node process sharing the same port.
  http.createServer((req, res) => res.end(`Served by ${process.pid}`))
      .listen(3000);
}

▶ Preview: console

Notice what cluster forces on you: those eight workers are separate processes that share nothing in memory. A session stored in worker 3’s RAM is invisible to worker 7. So the single-box clustering trap is the same trap as multi-box horizontal scaling — and the same solution (externalize state) fixes both. Node makes you confront statelessness early, which is a gift.

Now that you have many boxes (or many workers), something has to decide which one handles each request. That’s the load balancer — next.