Database query pagination is essential for high-performing web applications dealing with large datasets. When implemented correctly, pagination improves application responsiveness, reduces server load, and enhances user experience by presenting data in manageable chunks. In this article, I’ll cover various pagination strategies, their implementation details, performance considerations, and real-world examples.
Understanding Pagination Fundamentals
Pagination divides large datasets into smaller, discrete pages that users can navigate sequentially. While the concept seems straightforward, implementing efficient pagination for high-scale applications requires careful consideration of database behavior, performance implications, and user experience.
The most common pagination approaches include:
- Offset-based pagination
- Cursor-based pagination
- Keyset pagination
- Page number pagination
Each method has specific use cases, advantages, and limitations that developers must understand to make informed implementation decisions.
Offset-Based Pagination: The Traditional Approach
Offset-based pagination is the most straightforward approach, using LIMIT and OFFSET SQL clauses to retrieve specific data segments.
SELECT * FROM products
ORDER BY created_at DESC
LIMIT 20 OFFSET 40;
This retrieves items 41-60 from the products table. The corresponding API endpoint might look like:
app.get('/api/products', async (req, res) => {
const page = parseInt(req.query.page) || 1;
const limit = parseInt(req.query.limit) || 20;
const offset = (page - 1) * limit;
try {
const products = await db('products')
.orderBy('created_at', 'desc')
.limit(limit)
.offset(offset);
const totalCount = await db('products').count('id as count').first();
res.json({
data: products,
pagination: {
currentPage: page,
itemsPerPage: limit,
totalItems: totalCount.count,
totalPages: Math.ceil(totalCount.count / limit)
}
});
} catch (error) {
res.status(500).json({ error: error.message });
}
});
While simple to implement, offset pagination suffers from performance degradation as the offset increases. The database must scan and discard all rows before the offset position, making it inefficient for large datasets.
Cursor-Based Pagination: The Performance Champion
Cursor-based pagination uses a pointer (cursor) to indicate where to start fetching the next set of results. This approach is significantly more efficient for large datasets as it doesn’t require the database to scan through discarded rows.
app.get('/api/products', async (req, res) => {
const limit = parseInt(req.query.limit) || 20;
const cursor = req.query.cursor;
try {
let query = db('products')
.orderBy('created_at', 'desc')
.limit(limit + 1);
if (cursor) {
const decodedCursor = Buffer.from(cursor, 'base64').toString('ascii');
const [cursorField, cursorValue] = decodedCursor.split(':');
query = query.where(cursorField, '<', cursorValue);
}
const results = await query;
// Check if there are more results
const hasNextPage = results.length > limit;
if (hasNextPage) {
results.pop();
}
// Create the next cursor
let nextCursor = null;
if (hasNextPage && results.length > 0) {
const lastItem = results[results.length - 1];
nextCursor = Buffer.from(`created_at:${lastItem.created_at}`).toString('base64');
}
res.json({
data: results,
pagination: {
hasNextPage,
nextCursor
}
});
} catch (error) {
res.status(500).json({ error: error.message });
}
});
This implementation uses a base64-encoded string containing the field and value to paginate by. The cursor approach provides:
- Consistent performance regardless of dataset size
- Protection against duplicate or missing items when data changes
- Better handling of real-time data changes
Keyset Pagination: Balancing Performance and Flexibility
Keyset pagination (also called seek pagination) is a form of cursor-based pagination that uses the values of the sorted columns from the last row of the current page to fetch the next page.
-- First page
SELECT * FROM products
ORDER BY created_at DESC, id DESC
LIMIT 20;
-- Next page (assuming last item has created_at='2023-10-15T14:30:00Z' and id=1542)
SELECT * FROM products
WHERE
(created_at < '2023-10-15T14:30:00Z') OR
(created_at = '2023-10-15T14:30:00Z' AND id < 1542)
ORDER BY created_at DESC, id DESC
LIMIT 20;
The implementation in a Node.js API might look like:
app.get('/api/products', async (req, res) => {
const limit = parseInt(req.query.limit) || 20;
const { createdAt, id } = req.query; // Last seen values
try {
let query = db('products')
.orderBy('created_at', 'desc')
.orderBy('id', 'desc') // Secondary sort for uniqueness
.limit(limit);
// Apply keyset condition if we have previous values
if (createdAt && id) {
query = query.where(function() {
this.where('created_at', '<', createdAt)
.orWhere(function() {
this.where('created_at', '=', createdAt)
.andWhere('id', '<', id);
});
});
}
const results = await query;
// Generate values for the next page
const nextPageValues = results.length > 0 ? {
createdAt: results[results.length - 1].created_at,
id: results[results.length - 1].id
} : null;
res.json({
data: results,
pagination: {
hasNextPage: results.length === limit,
nextPageValues
}
});
} catch (error) {
res.status(500).json({ error: error.message });
}
});
Keyset pagination provides excellent performance while supporting complex sorting scenarios. The inclusion of a unique identifier (like id) ensures consistent pagination even when multiple rows have identical values in the primary sort column.
Handling Dynamic Sorting with Pagination
Real-world applications often need to support user-controlled sorting along with pagination. This requires dynamic query building based on the sort parameters.
app.get('/api/products', async (req, res) => {
const limit = parseInt(req.query.limit) || 20;
const sortField = req.query.sortField || 'created_at';
const sortDirection = req.query.sortDirection === 'asc' ? 'asc' : 'desc';
const cursor = req.query.cursor;
try {
let query = db('products')
.orderBy(sortField, sortDirection)
.orderBy('id', 'desc') // Secondary sort for uniqueness
.limit(limit + 1);
if (cursor) {
const decodedCursor = JSON.parse(Buffer.from(cursor, 'base64').toString('ascii'));
const operator = sortDirection === 'asc' ? '>' : '<';
query = query.where(function() {
this.where(sortField, operator, decodedCursor[sortField])
.orWhere(function() {
this.where(sortField, '=', decodedCursor[sortField])
.andWhere('id', '<', decodedCursor.id);
});
});
}
const results = await query;
// Check if there are more results
const hasNextPage = results.length > limit;
if (hasNextPage) {
results.pop();
}
// Create the next cursor
let nextCursor = null;
if (hasNextPage && results.length > 0) {
const lastItem = results[results.length - 1];
const cursorData = {
id: lastItem.id,
[sortField]: lastItem[sortField]
};
nextCursor = Buffer.from(JSON.stringify(cursorData)).toString('base64');
}
res.json({
data: results,
pagination: {
hasNextPage,
nextCursor
}
});
} catch (error) {
res.status(500).json({ error: error.message });
}
});
This implementation supports changing the sort field and direction while maintaining efficient pagination.
Implementing Pagination in GraphQL
GraphQL APIs benefit from a standardized pagination approach. The Relay Connection specification provides a consistent pattern using edges, nodes, and page info.
const { GraphQLObjectType, GraphQLString, GraphQLList, GraphQLInt, GraphQLBoolean } = require('graphql');
const PageInfoType = new GraphQLObjectType({
name: 'PageInfo',
fields: {
hasNextPage: { type: GraphQLBoolean },
hasPreviousPage: { type: GraphQLBoolean },
startCursor: { type: GraphQLString },
endCursor: { type: GraphQLString }
}
});
const ProductEdgeType = new GraphQLObjectType({
name: 'ProductEdge',
fields: {
cursor: { type: GraphQLString },
node: { type: ProductType }
}
});
const ProductConnectionType = new GraphQLObjectType({
name: 'ProductConnection',
fields: {
edges: { type: new GraphQLList(ProductEdgeType) },
pageInfo: { type: PageInfoType },
totalCount: { type: GraphQLInt }
}
});
// Query implementation
const QueryType = new GraphQLObjectType({
name: 'Query',
fields: {
products: {
type: ProductConnectionType,
args: {
first: { type: GraphQLInt },
after: { type: GraphQLString }
},
resolve: async (_, args) => {
const limit = args.first || 20;
let query = db('products')
.orderBy('created_at', 'desc')
.limit(limit + 1);
if (args.after) {
const cursor = Buffer.from(args.after, 'base64').toString('ascii');
const [timestamp] = cursor.split(':');
query = query.where('created_at', '<', timestamp);
}
const products = await query;
const hasNextPage = products.length > limit;
if (hasNextPage) {
products.pop();
}
const edges = products.map(product => {
const cursor = Buffer.from(`${product.created_at}:${product.id}`).toString('base64');
return {
cursor,
node: product
};
});
const pageInfo = {
hasNextPage,
hasPreviousPage: !!args.after,
startCursor: edges.length > 0 ? edges[0].cursor : null,
endCursor: edges.length > 0 ? edges[edges.length - 1].cursor : null
};
const totalCount = await db('products').count('id as count').first();
return {
edges,
pageInfo,
totalCount: totalCount.count
};
}
}
}
});
This GraphQL implementation follows the Connection pattern, providing a consistent interface for clients to request paginated data.
Optimizing Pagination for Performance
Beyond the basic implementation, several database-level optimizations can significantly improve pagination performance:
1. Proper Indexing
Creating the right indexes is crucial for pagination performance. For offset-based pagination, ensure the sorted columns are indexed:
CREATE INDEX idx_products_created_at ON products(created_at);
For keyset pagination with multiple columns, create a composite index:
CREATE INDEX idx_products_pagination ON products(created_at, id);
2. Avoiding COUNT(*) for Total Pages
The total count query can become expensive for large tables. Some alternatives include:
- Approximate counts from database statistics
- Cached counts that are periodically updated
- Estimated counts based on table size and row length
// Using PostgreSQL's estimated count
const getEstimatedCount = async () => {
const result = await db.raw(`
SELECT reltuples::bigint AS estimate
FROM pg_class
WHERE relname = 'products';
`);
return result.rows[0].estimate;
};
3. Using Covering Indexes
A covering index includes all columns needed by a query, avoiding the need to access the table:
CREATE INDEX idx_products_pagination_covering ON products(created_at, id, name, price, status);
This works well when the paginated view only displays a subset of columns.
Handling Data Consistency During Pagination
When data changes between pagination requests, users might see duplicate or missing items. Let’s address this challenge:
Maintaining Consistency with Immutable Cursors
Using a cursor that includes both the sort value and a unique identifier ensures consistency:
function generateCursor(item) {
const cursorData = {
id: item.id,
createdAt: item.created_at
};
return Buffer.from(JSON.stringify(cursorData)).toString('base64');
}
function decodeCursor(cursor) {
if (!cursor) return null;
return JSON.parse(Buffer.from(cursor, 'base64').toString('ascii'));
}
Implementing “Point-in-Time” Pagination
For applications with frequent data changes, consider implementing a point-in-time view:
app.get('/api/products', async (req, res) => {
const limit = parseInt(req.query.limit) || 20;
const cursor = req.query.cursor;
const paginationToken = req.query.paginationToken || Date.now().toString();
// Use a temporary table or view with a snapshot of data
if (!cursor) {
await db.raw(`
CREATE TEMPORARY TABLE temp_products_${paginationToken} AS
SELECT * FROM products ORDER BY created_at DESC
`);
}
let query = db(`temp_products_${paginationToken}`)
.limit(limit + 1);
if (cursor) {
const offset = parseInt(cursor);
query = query.offset(offset);
}
const results = await query;
// Determine if there are more pages
const hasNextPage = results.length > limit;
if (hasNextPage) results.pop();
const nextCursor = cursor ? parseInt(cursor) + limit : limit;
res.json({
data: results,
pagination: {
hasNextPage,
nextCursor: hasNextPage ? nextCursor.toString() : null,
paginationToken
}
});
});
This approach creates a consistent view of the data for the duration of the pagination session.
Implementing Pagination in Different Frameworks
React Frontend Pagination
import React, { useState, useEffect } from 'react';
import axios from 'axios';
function ProductList() {
const [products, setProducts] = useState([]);
const [loading, setLoading] = useState(false);
const [cursor, setCursor] = useState(null);
const [hasNextPage, setHasNextPage] = useState(true);
const loadProducts = async (newCursor = null) => {
setLoading(true);
try {
const response = await axios.get('/api/products', {
params: { limit: 20, cursor: newCursor }
});
if (newCursor) {
setProducts(prev => [...prev, ...response.data.data]);
} else {
setProducts(response.data.data);
}
setCursor(response.data.pagination.nextCursor);
setHasNextPage(response.data.pagination.hasNextPage);
} catch (error) {
console.error('Error loading products:', error);
} finally {
setLoading(false);
}
};
useEffect(() => {
loadProducts();
}, []);
return (
<div>
<h1>Products</h1>
<div className="product-grid">
{products.map(product => (
<div key={product.id} className="product-card">
<h3>{product.name}</h3>
<p>${product.price}</p>
</div>
))}
</div>
{hasNextPage && (
<button
onClick={() => loadProducts(cursor)}
disabled={loading}
>
{loading ? 'Loading...' : 'Load More'}
</button>
)}
</div>
);
}
export default ProductList;
Laravel Implementation
// ProductController.php
public function index(Request $request)
{
$limit = $request->input('limit', 20);
$cursor = $request->input('cursor');
$query = Product::orderBy('created_at', 'desc')
->limit($limit + 1);
if ($cursor) {
$decodedCursor = json_decode(base64_decode($cursor), true);
$createdAt = $decodedCursor['created_at'];
$id = $decodedCursor['id'];
$query->where(function($q) use ($createdAt, $id) {
$q->where('created_at', '<', $createdAt)
->orWhere(function($q2) use ($createdAt, $id) {
$q2->where('created_at', '=', $createdAt)
->where('id', '<', $id);
});
});
}
$products = $query->get();
$hasNextPage = $products->count() > $limit;
if ($hasNextPage) {
$products->pop();
}
$nextCursor = null;
if ($hasNextPage && $products->count() > 0) {
$lastProduct = $products->last();
$cursorData = [
'created_at' => $lastProduct->created_at,
'id' => $lastProduct->id
];
$nextCursor = base64_encode(json_encode($cursorData));
}
return response()->json([
'data' => $products,
'pagination' => [
'hasNextPage' => $hasNextPage,
'nextCursor' => $nextCursor
]
]);
}
Real-world Considerations
When implementing pagination in production applications, consider these additional factors:
Cache Optimization
Caching paginated results can significantly improve performance for frequently accessed pages:
const getCachedProducts = async (cursor, limit) => {
const cacheKey = `products:${cursor || 'first'}:${limit}`;
// Try to get from cache first
const cachedResult = await redis.get(cacheKey);
if (cachedResult) {
return JSON.parse(cachedResult);
}
// If not in cache, fetch from database
const results = await fetchProductsFromDb(cursor, limit);
// Cache the results with expiration
await redis.set(cacheKey, JSON.stringify(results), 'EX', 300); // 5 minutes
return results;
};
Handling Deleted Items
When items can be deleted during pagination, your implementation must account for this:
app.get('/api/products', async (req, res) => {
const limit = parseInt(req.query.limit) || 20;
const cursor = req.query.cursor;
let query = db('products')
.where('deleted_at', null) // Filter out deleted items
.orderBy('created_at', 'desc')
.limit(limit + 5); // Fetch extra items to account for deletions
if (cursor) {
const decodedCursor = decodeCursor(cursor);
query = query.where('created_at', '<', decodedCursor.createdAt);
}
let results = await query;
// Limit to the requested number of items
const hasNextPage = results.length > limit;
results = results.slice(0, limit);
res.json({
data: results,
pagination: {
hasNextPage,
nextCursor: hasNextPage ? generateCursor(results[results.length - 1]) : null
}
});
});
This implementation fetches extra items to ensure we can return a full page even if some items were deleted.
Conclusion
Effective pagination is crucial for high-performance web applications dealing with large datasets. By implementing the right pagination strategy for your specific needs, you can provide a smooth user experience while maintaining application performance.
Cursor-based pagination offers the best performance characteristics for most applications, especially those with large datasets or frequent updates. Keyset pagination provides a good balance of performance and flexibility when you need complex sorting options.
Remember to consider database-specific optimizations, caching strategies, and data consistency requirements when implementing pagination in your applications. With the right approach, you’ll create a responsive and reliable user experience even when dealing with millions of records.