Think about the last time you used a website that felt slow. Maybe a button took a second to respond, or an image seemed to load in chunks. As developers, we often build and test our applications in a perfect bubble—our powerful computers, on a fast office network. But that’s not how the world experiences our work.
Real users face a different reality. They might be on a shaky mobile connection on a train, using an older phone, or located halfway across the world from our servers. The only way to truly know how our application performs is to measure it from their perspective, as it happens. This practice is what we call monitoring real user performance.
Synthetic tests, which run scripts in a simulated environment, are like a car’s safety test in a lab. They are crucial and repeatable. Real user monitoring, or RUM, is like having sensors on millions of cars driving real roads in all weather conditions. It tells you what’s actually happening.
I remember working on a feature that was blazing fast in all our internal tests. We launched it confidently. A week later, our analytics showed a puzzling dip in conversions for a specific user flow. Our lab tests couldn’t explain it. Only when we looked at real performance data did we see the issue: for users on certain mobile devices, a key interactive element was taking nearly four seconds to become responsive. It was a problem that only existed in the wild.
The core of this monitoring is a set of metrics known as Core Web Vitals. These are signals Google has identified as critical to a good user experience. Let’s talk about them in plain terms.
Largest Contentful Paint (LCP) asks: “When does the main content of the page visibly load?” It marks the moment the largest image or text block appears on screen. A good LCP should happen within 2.5 seconds. Users get frustrated staring at a blank or partial screen.
First Input Delay (FID) measures: “How long does the page take to respond when I first try to use it?” You click a button or a link—does the browser react immediately, or is there a noticeable lag? This delay should be less than 100 milliseconds. A laggy page feels broken.
Cumulative Layout Shift (CLS) tracks: “Does stuff jump around while I’m trying to read or click?” Nothing is more annoying than aiming for a “Submit” button only to have an image load above it and push the page down, causing you to click an ad instead. This measures visual stability.
The browser gives us tools to measure these directly. Here’s a basic way to start collecting this data.
// Listen for the important performance moments
const performanceWatcher = new PerformanceObserver((list) => {
for (const entry of list.getEntries()) {
if (entry.name === 'LCP') {
console.log('Main content loaded at:', entry.startTime, 'ms');
sendToAnalytics('lcp', entry.startTime);
}
if (entry.name === 'FID') {
const delay = entry.processingStart - entry.startTime;
console.log('First input delay was:', delay, 'ms');
sendToAnalytics('fid', delay);
}
if (entry.name === 'CLS') {
console.log('Total layout shift score:', entry.value);
sendToAnalytics('cls', entry.value);
}
}
});
// Start watching for these specific types of events
performanceWatcher.observe({
entryTypes: ['largest-contentful-paint', 'first-input', 'layout-shift']
});
// A simple function to send data to your backend
function sendToAnalytics(metricName, value) {
const data = {
name: metricName,
value: value,
url: window.location.href,
userAgent: navigator.userAgent, // Tells us about browser and device
timestamp: new Date().toISOString()
};
// Use sendBeacon for reliable sending, even during page unload
navigator.sendBeacon('/api/performance', JSON.stringify(data));
}
This is a great start, but it’s just the foundation. Real user monitoring is more than just these three metrics. It’s about understanding the complete journey. How long did the entire page take to load? Did any images or scripts fail to load? Were there any JavaScript errors that broke a feature for that user?
We also have to be good citizens. Collecting data from every single page view from every user can overwhelm our systems and raise privacy concerns. We need to be smart about it. We can sample the data, collect it in batches, and always anonymize it. Here is a more complete example that considers these factors.
class PerformanceTracker {
constructor(sampleRate = 0.1) {
// Only track data for, say, 10% of user sessions
this.isActive = Math.random() < sampleRate;
this.eventQueue = [];
if (this.isActive) {
this.startWatching();
}
}
startWatching() {
// 1. Watch Core Web Vitals
const vitalsObserver = new PerformanceObserver((list) => {
list.getEntries().forEach(this.recordEvent.bind(this, 'vital'));
});
vitalsObserver.observe({ entryTypes: ['largest-contentful-paint', 'first-input', 'layout-shift'] });
// 2. Watch general page navigation (the old-school 'load' event)
const navObserver = new PerformanceObserver((list) => {
list.getEntries().forEach(this.recordEvent.bind(this, 'navigation'));
});
navObserver.observe({ entryTypes: ['navigation'] });
// 3. Watch resource loads (images, scripts, CSS)
const resObserver = new PerformanceObserver((list) => {
list.getEntries().forEach(this.recordEvent.bind(this, 'resource'));
});
resObserver.observe({ entryTypes: ['resource'] });
// 4. Catch JavaScript errors
window.addEventListener('error', (event) => {
this.recordEvent('error', {
message: event.message,
source: event.filename,
line: event.lineno
});
});
// 5. Catch failed promise rejections
window.addEventListener('unhandledrejection', (event) => {
this.recordEvent('promise_error', { reason: event.reason });
});
}
recordEvent(type, data) {
// Add context to every event
const event = {
type,
data,
timestamp: Date.now(),
path: window.location.pathname,
connection: navigator.connection ? navigator.connection.effectiveType : 'unknown',
// Create a unique ID for this page visit, not the user
visitId: sessionStorage.getItem('perfVisitId') || this.createVisitId()
};
this.eventQueue.push(event);
this.maybeSendQueue();
}
createVisitId() {
const id = 'visit_' + Math.random().toString(36).substring(2);
sessionStorage.setItem('perfVisitId', id);
return id;
}
maybeSendQueue() {
// Send batches of 5 events to reduce network requests
if (this.eventQueue.length >= 5) {
const batch = this.eventQueue.splice(0, 5);
this.sendToServer(batch);
}
}
sendToServer(batch) {
// Use a final attempt to send data when the user leaves the page
window.addEventListener('beforeunload', () => {
navigator.sendBeacon('/api/performance/batch', JSON.stringify(batch));
}, { once: true });
// Or send normally
fetch('/api/performance/batch', {
method: 'POST',
body: JSON.stringify({ events: batch }),
keepalive: true // Important: ensures request finishes even if page closes
});
}
}
// Start the tracker for 10% of sessions
const tracker = new PerformanceTracker(0.1);
This class does a lot. It samples users, batches data, and collects a rich picture: vitals, page loads, resource timings, and errors. It even captures the network connection type (like ‘4g’ or ‘wifi’) and uses a “visitId” to group events from a single session without identifying the person.
Now, all this data is flowing from the browser. But data alone is just noise. We need to store it, analyze it, and make it tell a story. This is where the backend comes in. We need a system that can handle this stream of events, store them efficiently, and let us query them in useful ways.
Let’s build a simple backend service using Node.js and a time-series database approach. We’ll assume we’re using PostgreSQL, which works well for this.
// server.js - Our backend API endpoint
const express = require('express');
const { Pool } = require('pg');
const app = express();
app.use(express.json());
// Set up connection to the database
const pool = new Pool({
connectionString: process.env.DATABASE_URL,
});
// Endpoint to receive batched performance events
app.post('/api/performance/batch', async (req, res) => {
const events = req.body.events;
const client = await pool.connect();
try {
await client.query('BEGIN'); // Start a database transaction
for (const event of events) {
// Store different event types in appropriate tables
if (event.type === 'vital') {
await client.query(
`INSERT INTO web_vitals
(visit_id, path, metric_name, metric_value, connection, created_at)
VALUES ($1, $2, $3, $4, $5, $6)`,
[
event.visitId,
event.path,
event.data.name,
event.data.value || event.data.startTime,
event.connection,
new Date(event.timestamp)
]
);
}
if (event.type === 'error') {
await client.query(
`INSERT INTO error_logs
(visit_id, path, error_message, source_file, line_number, created_at)
VALUES ($1, $2, $3, $4, $5, $6)`,
[
event.visitId,
event.path,
event.data.message,
event.data.source,
event.data.line,
new Date(event.timestamp)
]
);
}
// Similar blocks for 'navigation', 'resource', 'promise_error'
}
await client.query('COMMIT'); // Save all inserts
client.release();
res.status(200).send({ status: 'ok', received: events.length });
} catch (error) {
await client.query('ROLLBACK'); // Undo on error
client.release();
console.error('Failed to store performance batch:', error);
res.status(500).send({ error: 'Storage failed' });
}
});
// A crucial analytical endpoint: Get performance trends
app.get('/api/performance/summary', async (req, res) => {
const { metric, from, to, group } = req.query;
// Example: metric='LCP', from='2023-10-01', to='2023-10-07', group='day'
let groupByClause;
if (group === 'hour') {
groupByClause = "DATE_TRUNC('hour', created_at)";
} else {
groupByClause = "DATE_TRUNC('day', created_at)"; // Default to daily
}
const queryText = `
SELECT
${groupByClause} as time_period,
COUNT(*) as samples,
ROUND(AVG(metric_value), 2) as average,
ROUND(PERCENTILE_CONT(0.5) WITHIN GROUP (ORDER BY metric_value), 2) as p50,
ROUND(PERCENTILE_CONT(0.9) WITHIN GROUP (ORDER BY metric_value), 2) as p90
FROM web_vitals
WHERE metric_name = $1
AND created_at >= $2::timestamptz
AND created_at <= $3::timestamptz
GROUP BY time_period
ORDER BY time_period;
`;
try {
const result = await pool.query(queryText, [metric, from, to]);
res.json(result.rows);
} catch (error) {
console.error('Query failed:', error);
res.status(500).send({ error: 'Analysis failed' });
}
});
// Another useful endpoint: Compare performance by user condition
app.get('/api/performance/breakdown', async (req, res) => {
const { metric } = req.query;
const queryText = `
SELECT
connection,
COUNT(*) as samples,
ROUND(AVG(metric_value), 2) as avg_value,
ROUND(PERCENTILE_CONT(0.9) WITHIN GROUP (ORDER BY metric_value), 2) as p90_value
FROM web_vitals
WHERE metric_name = $1
AND created_at > NOW() - INTERVAL '7 days'
GROUP BY connection
ORDER BY samples DESC;
`;
try {
const result = await pool.query(queryText, [metric]);
// This might show that users on 'slow-2g' have an average LCP of 8500ms,
// while users on '4g' average 2200ms.
res.json(result.rows);
} catch (error) {
console.error('Breakdown query failed:', error);
res.status(500).send({ error: 'Breakdown failed' });
}
});
app.listen(3000, () => console.log('RUM backend listening on port 3000'));
With these endpoints, we’re not just dumping data into a black hole. We can now ask meaningful questions of our data. What is the 90th percentile LCP for our checkout page over the last week? How does FID differ between Chrome and Safari users? Are errors spiking after our last deployment?
The final piece is turning this data into information you can see and act upon. This doesn’t need to be a fancy commercial dashboard. Sometimes a simple internal webpage that fetches from these APIs is the best start.
<!-- A simple, functional dashboard -->
<!DOCTYPE html>
<html>
<head>
<title>Performance Dashboard</title>
<script src="https://cdn.jsdelivr.net/npm/chart.js"></script>
</head>
<body>
<h1>LCP Trends (Last 7 Days)</h1>
<canvas id="lcpChart" width="800" height="400"></canvas>
<h2>Performance by Connection Type</h2>
<table id="connectionTable">
<thead><tr><th>Connection</th><th>Samples</th><th>Avg LCP</th><th>90th %ile</th></tr></thead>
<tbody></tbody>
</table>
<script>
async function loadData() {
// Fetch trend data
const trendRes = await fetch('/api/performance/summary?metric=LCP&from=2023-10-01&to=2023-10-07&group=day');
const trendData = await trendRes.json();
// Fetch breakdown data
const breakdownRes = await fetch('/api/performance/breakdown?metric=LCP');
const breakdownData = await breakdownRes.json();
renderChart(trendData);
renderTable(breakdownData);
}
function renderChart(data) {
const ctx = document.getElementById('lcpChart').getContext('2d');
new Chart(ctx, {
type: 'line',
data: {
labels: data.map(row => new Date(row.time_period).toLocaleDateString()),
datasets: [{
label: 'Average LCP (ms)',
data: data.map(row => row.average),
borderColor: 'rgb(75, 192, 192)',
}, {
label: '90th Percentile LCP (ms)',
data: data.map(row => row.p90),
borderColor: 'rgb(255, 99, 132)',
}]
},
options: { responsive: false }
});
}
function renderTable(data) {
const tbody = document.querySelector('#connectionTable tbody');
tbody.innerHTML = data.map(row => `
<tr>
<td>${row.connection || 'unknown'}</td>
<td>${row.samples}</td>
<td>${row.avg_value}ms</td>
<td>${row.p90_value}ms</td>
</tr>
`).join('');
}
loadData();
</script>
</body>
</html>
This dashboard is basic, but it shows the power you now have. You can see trends. You can identify that users on poor connections suffer much worse performance. This is the insight that drives action. Maybe you need to implement a more aggressive loading strategy for critical images on slow connections. Perhaps you discover that a third-party script loaded on every page is the primary cause of input delay.
The goal of all this work is to close the loop. You write code, you deploy it, and then you listen to what the real world tells you about its performance. It transforms performance from a theoretical concern into a measurable, manageable aspect of your product.
I’ve found that once teams start seeing this data, their perspective changes. Arguments about optimization priorities are settled with charts. The impact of a new library or a changed API endpoint becomes clear. You stop guessing about what to fix next. The data shows you.
Start simple. Measure your Core Web Vitals for a small percentage of users. Store that data somewhere you can query it. Look at it once a week. You will be surprised by what you learn. From there, you can grow the sophistication of your tracking, your analysis, and your alerts. The most important step is to start listening to your real users. They are the ultimate test, and they are already using your application. All you have to do is pay attention.