Offset vs Cursor

Pagination may seem like an implementation detail, but in systems with millions of records, it becomes a critical architectural decision. In APIs handling large data volumes, the pagination strategy directly impacts performance, consistency, computational cost, and scalability.

In this article, I analyze three common approaches:

  1. Page Number Pagination
  2. Offset Pagination
  3. Cursor Pagination (Keyset Pagination)

The tested scenario:

  • Java + Spring Boot API (language and framework are irrelevant for this analysis)
  • PostgreSQL
  • Table with 10 million records
  • Queries ordered by updated_at (timestamp), id
  • Composite index (updated_at, id)
  • Incremental resync operations between systems, returning hundreds of thousands of paginated records

I included a benchmark using EXPLAIN ANALYZE for deeper inspection.

Let’s begin.


Page Number Pagination

What it is

The client sends:

GET /resources?page=5&size=100

Internally, the backend converts it to:

SELECT *
FROM resource
ORDER BY updated_at
LIMIT 100 OFFSET 400;

That is: offset = (page - 1) * size


Complexity, Performance and Cost for PostgreSQL

Even with an index on updated_at:

CREATE INDEX idx_resource_updated_at ON resource(updated_at);

PostgreSQL must:

  1. Traverse N index entries
  2. Discard the first records according to OFFSET
  3. Return the next records within the LIMIT

If you request: page=5000&size=100, which results in Offset=499900. The database must traverse ~500k records to return 100.

Approximate complexity

  • Time ≈ O(n) relative to the offset
  • The larger the page number, the worse the performance

With 10 million records:

  • Deep pages become extremely slow
  • CPU and I/O usage may increase significantly

But when does it still make sense?

✔ User interfaces ✔ Administrative dashboards ✔ Small or mostly static datasets

In the following scenarios, it is recommended to avoid this approach:

❌ Large-scale resync ❌ Massive exports ❌ Continuous streams


Offset Pagination (Explicit)

In practice, this is the same as page number pagination, but the API exposes:

GET /resources?offset=400000&limit=100

SQL

SELECT *
FROM resource
ORDER BY updated_at
LIMIT 100 OFFSET 400000;

Real difference?

From the database perspective: none.

The difference exists only at the API contract level.

This approach presents the same problems as the previous one:

  • Large OFFSET → large scan
  • Linearly increasing cost

Shift Problems

An additional issue with the previous two approaches is caused by possible ordering shifts.

Imagine the ordering is based on updated_at and you fetch two pages of 10 items each, totaling 20 items.

You fetch the first page with items 1 through 10. Before fetching the second page, item 8 (or any item from the first page) is updated.

Item 8 has its updated_at changed to a more recent value, moving it to the end of the, now position 20.

The problem is that item 11, which should have been the first item of the second page, is now shifted to position 10. When you fetch the second page, the original item 11 is not there, because it was moved to the previous page.

In this case, you will never fetch the original item 11 (now item 10).

This has the potential to generate severe inconsistencies in resync operations.


Cursor Pagination (Keyset Pagination)

What it is

Instead of skipping N records, you use the last ordering value as the continuation point.

Example:

GET /resources?cursor=2025-02-01T10:15:33Z&limit=100

SQL

SELECT *
FROM resource
WHERE updated_at > :cursor
ORDER BY updated_at ASC
LIMIT 100;

However, there is a hidden issue with this approach: if ordering is based only on updated_at, there is a risk that items updated within the same microsecond (for example, during bulk updates) may be unintentionally skipped.

Some strategies to avoid this problem:

Use composite ordering, including a field such as ID:

Recommended index for optimal performance:

CREATE INDEX idx_resource_updated_at_id 
ON resource(updated_at, id);
SELECT *
FROM resource
WHERE (updated_at, id) > (:updated_at, :id)
ORDER BY updated_at, id
LIMIT 100;

In this scenario, two fields are required to generate a composite cursor (:updated_at, :id).

If you prefer to hide this complexity from the API consumer, you can encapsulate both values into a single token using Base64 or even wrap them in a UUID.

Another strategy would be to replace the simple updated_at timestamp with an update_id (update identifier) field of type ULID. This type of field maintains a sortable temporal component while resolving conflicts with a random component. However, this approach is only valid if the ULID is generated on every update, semantically replacing updated_at as the ordering criterion.


Performance aspects of the Cursor approach

With a proper index:

  • Index range scan
  • Complexity ≈ O(log n) to locate the starting point
  • Then only sequential index reads

Performance remains practically constant for deep pagination, since the dominant cost becomes O(log n), which grows very slowly.


Cursor and Resync Between Systems

This is the ideal scenario for cursor-based pagination.

Typical flow:

  1. System B stores the last (updated_at, id)

  2. Calls the API:

    GET /resources?cursor=lastSeen
    
  3. Processes the batch

  4. Updates the checkpoint

  5. Continues

This guarantees:

✔ No skipped records ✔ No duplicated records ✔ Scales to millions ✔ Works with mutable data


Consistency and Concurrency

Offset

Problem:

Page 1 → records 1–100
Page 2 → records 101–200

If a record between 1–100 is updated and moved in the sort order:

  • It may reappear
  • It may create gaps

Cursor

Cursor-based query:

WHERE updated_at > last_seen

It does not depend on position.

It depends only on the ordered value.

Much more robust.


I/O and Memory Impact

Large OFFSET

  • Large index traversal
  • Massive discard of rows
  • Increased buffer usage
  • Potential read amplification

Cursor

  • Pure range scan
  • Minimal reads
  • Better cache usage

Practical Test

The OFFSET Problem

EXPLAIN (ANALYZE, BUFFERS)
SELECT id, updated_at
FROM resource
ORDER BY updated_at, id
LIMIT 100 OFFSET 5000000;
Limit
  -> Index Only Scan using idx_resource_updated_at_id
     rows=5000100
     Heap Fetches: 0
     Buffers: shared hit=3502223

Execution Time: 711 ms

What actually happened?

Even with:

  • Perfect index
  • Index-only scan
  • No heap fetch
  • Data already cached

The database had to:

  • Traverse 5,000,100 records
  • Discard 5,000,000
  • Return only 100

Buffers accessed: 3,502,223 pages

This means millions of internal index structure accesses.

OFFSET is position-based

When executing:

SELECT id, updated_at
FROM resource
ORDER BY updated_at, id
LIMIT 100 OFFSET 5000000;

PostgreSQL must:

  1. Traverse the ordered index
  2. Advance through 5,000,000 entries
  3. Discard them
  4. Return the next 100

In other words, the database processed millions of entries to deliver only 100 records.

If OFFSET doubles, execution time roughly doubles.

Even in the best possible scenario (index-only, no disk I/O), the cost remains high.


Cursor (Keyset Pagination)

EXPLAIN (ANALYZE, BUFFERS)
SELECT id, updated_at
FROM resource
WHERE (updated_at, id) > ('2025-09-02 11:34:49.908717', 3736661)
ORDER BY updated_at, id
LIMIT 100;
Limit
  -> Index Only Scan using idx_resource_updated_at_id
     Index Cond: (ROW(updated_at, id) > (...))
     Heap Fetches: 0
     Buffers: shared hit=71 read=1

Execution Time: 0.057 ms

What happened here?

PostgreSQL:

  1. Navigated the B-tree to locate the correct leaf (cost O(log n))
  2. Located the exact starting point
  3. Read only 100 records

Buffers accessed: 72 pages

StrategyBuffers
OFFSET 5M3.502.223
Cursor72

Difference: ~48,000x fewer pages accessed

Cursor uses an indexed range scan: T(n) ≈ O(log n + k) Where:

  • n = table size
  • k = page size

Execution time remains practically constant relative to pagination depth, even with 10 million records.


Final Comparison

CriteriaOFFSETCursor
ComplexityO(n)O(log n + k)
Buffers (5M)3.5M72
Execution Time711 ms0.057 ms
Depth ScalingLinearLogarithmic
Stable Under MutationNoYes
Ideal for UIYesPartial
Ideal for ResyncNoYes

Conclusion

In a dataset with 10 million records:

  • OFFSET scans millions of entries
  • Cursor starts exactly at the required point
  • The difference can exceed 10,000x
  • OFFSET cost is structural, not circumstantial

For systems that perform:

  • Resync
  • Massive exports
  • Service integration
  • Incremental processing

Keyset Pagination is an architectural requirement, not merely an optimization


OFFSET is not inefficient due to poor implementation; it is structurally linear because of how B-tree indexes work.

When does OFFSET still make sense?

✔ Administrative interfaces ✔ Small tables ✔ Mostly static datasets ✔ When jumping to arbitrary pages is required

But not in large-scale systems.


Empirical results make it clear:

OFFSET degrades linearly even in the best possible scenario. Cursor maintains stable performance relative to pagination depth, even with millions of records.

If you are designing an API backed by PostgreSQL and expect growth beyond a few million rows, the pagination strategy must be decided from the beginning.


That's all folks...

Artus