1 of 34

The Hidden Engine Behind Your Transactions

Mastering MVCC in PostgreSQL

Nikhil Hegde, Stryv.ai

HYDERABAD POSTGRES DAYS 2025

2 of 34

Agenda

    • How PostgreSQL Stores Your Data : (Pages, tuples, and hidden system columns)
    • Isolation Levels : (Avoiding deadlocks and subtle bugs)
    • Transactions & Visibility Rules : (xmin/xmax, snapshots, and real-time reads)
    • MVCC – The Heart of Concurrency : (Why multi-versioning beats locking)
    • The Hidden Cost: Table Bloat : (Why your disk fills up faster than expected)
    • Vacuum & Autovacuum Internals : (How cleanup really works and why tuning matters)
    • Wrap-Up & Q/A : (Key takeaways + open floor for questions)

3 of 34

Overall Architecture of PostgreSQL

4 of 34

What Happens When You Do an INSERT?

    • When you create a table, PostgreSQL creates a file in the database’s tablespace.

5 of 34

Pages, Blocks, and the Shared Buffer

    • By default, each page is 8KB
    • In PostgreSQL, the fundamental unit of storage is the page (block).
    • A table (or index) file is simply a collection of these 8KB pages.

6 of 34

    • PostgreSQL consults the Free Space Map (FSM) to find a page with enough free space.
    • A WAL (Write-Ahead Log) entry is generated to ensure durability.
    • WAL is immediately flushed to disk on commit, but the actual page may remain dirty in memory.
    • Background writer and checkpointer eventually flush dirty pages to disk for persistence.

What Happens When You Do an INSERT?

    • That page is loaded into Shared Buffers (if not already cached).
    • The new row (tuple) is inserted into the page in memory.

7 of 34

Write-Ahead Log (WAL)

    • Every change to data generates a WAL record.
    • Each WAL record has a unique LSN (Log Sequence Number) and metadata (e.g., CRC-32 checksum, block references, etc.).
    • WAL records are first written to the WAL buffers in memory.
    • The CRC-32 checksum ensures WAL records are validated and not corrupted.
    • On COMMIT, WAL is flushed to disk (guaranteeing durability before acknowledging the transaction).

8 of 34

ID

NAME

BALANCE

CREATED_BY

DELETED_BY

LOCATION

1

AMAN

500

750

0

(3,1)

2

BAMAN

200

751

0

(3,2)

System Columns

    • CREATED_BY → The transaction ID that created the tuple.
    • DELETED_BY → The transaction ID that deleted (or invalidated) the tuple.
    • A value of 0 in DELETED_BY means the tuple is still active (not deleted yet).
    • LOCATION → The physical address of the row , in the form (page#, offset#)

9 of 34

ID

NAME

BALANCE

CREATED_BY

DELETED_BY

LOCATION

1

AMAN

500

750

0

(3,1)

2

BAMAN

200

751

0

(3,2)

System Columns

TX_ID 763

UPDATE accounts SET balance = balance - 100 WHERE id = 1

UPDATE accounts SET balance = balance + 100 WHERE id = 2

COMMITTED

10 of 34

ID

NAME

BALANCE

CREATED_BY

DELETED_BY

LOCATION

1

AMAN

500

750

763

(3,1)

2

BAMAN

200

751

763

(3,2)

1

AMAN

400

763

0

(3,3)

2

BAMAN

300

763

0

(3,4)

    • The CREATED_BY column is called xmin
    • The DELETED_BY column is called xmax
    • The LOCATION column is called ctid

System Columns

11 of 34

What happens when you do a DELETE

    • The row is not physically removed instead, its XMAX column is updated with the deleting transaction ID.

12 of 34

What happens when you do an UPDATE

    • An UPDATE = Soft DELETE + INSERT (old row’s xmax is set, and a new row version is inserted ).

13 of 34

14 of 34

CLOG (Commit Log)

    • CLOG is a file that stores the status of each transaction.
    • Possible statuses:
      • In progress
      • Committed
      • Aborted
    • Physically stored in the pg_xact/ directory inside PostgreSQL’s data directory.

15 of 34

Visibility and Access Control

16 of 34

    • A dirty read happens when a transaction reads uncommitted data from another transaction.

Visibility and Access Control

    • In PostgreSQL, dirty reads are not possible, because the default isolation level is READ COMMITTED.
    • PostgreSQL enforces this using MVCC → visibility rules with xmin, xmax, and the CLOG (transaction status).

17 of 34

Isolation Levels

    • READ COMMITTED
    • Every SELECT sees only data committed at the moment the statement starts.
    • A new snapshot is taken per statement.
    • If you run the same query twice in one transaction, you may see different results if new commits happened in between.

18 of 34

Isolation Levels

2.REPEATABLE READ

    • All statements in the same transaction see data as it was when the transaction started.
    • A single snapshot is taken once per transaction.
    • If you query twice in the same transaction, you’ll always see the same version of rows, even if other transactions commit changes in between.

19 of 34

Isolation Levels

3. SERIALIZABLE

    • Results must be as if all transactions ran one at a time, in some serial order.
    • Database not only provides a snapshot, it also tracks your reads and writes.
    • If your transaction + another’s cannot coexist in a valid serial sequence, PostgreSQL will abort one transaction to prevent anomalies.

20 of 34

Actual Meaning of MVCC

READERS DON’T BLOCK WRITERS, AND WRITERS DON’T BLOCK READERS

    • MVCC (Multi-Version Concurrency Control) provides transaction isolation without heavy locking.
    • MVCC hides uncommitted data from other transactions (preventing dirty reads).
    • PostgreSQL maintains multiple versions of a row, along with visibility information and metadata (e.g., transaction IDs, snapshots).

21 of 34

Indexing

    • Creating an index generates a separate structure

22 of 34

Updating an Index – NON-HOT Update

    • New Heap Tuple: A fresh row version is inserted into the heap (new CTID).
    • Old Tuple Obsolete: The old tuple is marked as dead (with deleted_by XID).
    • Index Maintenance:
      • Every index on the table must insert a new entry pointing to the new CTID.
      • Old index entries are not removed immediately.
    • Impact:
      • Extra index writes → slower updates.
      • Stale index entries stay until VACUUM cleans them → index bloat risk.

23 of 34

HOT (Heap-Only Tuple) Update

    • No Index Changes: Index entries are untouched.
    • New Heap Tuple: New row version is created in the same heap page as the old one.
    • HOT Chain: Old tuple links to the new one; indexes always resolve to the latest visible version.
    • Vacuum Cleanup: Dead tuples in the HOT chain are reclaimed by regular VACUUM.
    • Impact:
      • Avoids extra index writes → faster updates.
      • Prevents index bloat from updates.

24 of 34

Updating an Index

25 of 34

26 of 34

HOT Chains: Why They Can’t Cross Pages?

27 of 34

Vaccum

    • We require 2 pages just to store 5 tuples

28 of 34

    • VACUUM: Reclaims storage by removing dead tuples, keeping tables lean & fast.
    • By default, PostgreSQL’s autovacuum triggers when dead tuples are roughly 20% of a table’s live rows (plus a small constant threshold).

TRIGGER_POINT = AUTOVACUUM_VACUUM_THRESHOLD

+ AUTOVACUUM_VACUUM_SCALE_FACTOR × RELTUPLES

    • autovacuum_vacuum_threshold (default: 50) → absolute minimum number of dead tuples before autovacuum even considers running.
    • autovacuum_vacuum_scale_factor (default: 0.2) → 20% of the total rows in the table.
    • reltuples → PostgreSQL’s estimate of live rows from pg_class statistics.

Vaccum

29 of 34

30 of 34

Vaccum FULL

    • Rewrites the entire table and indexes into a new physical file.
    • Removes all dead tuples (not just marking them for reuse).

31 of 34

    • tuples will be marked as reusable not cleared

Vaccum

32 of 34

    • Removes dead tuples completely (rewrites table + indexes).

VACUUM FULL

    • Releases space back to the OS (regular VACUUM only marks space reusable).
    • Manual operation → blocks the table with an exclusive lock.
    • Use case → rare, only when there’s severe bloat.

33 of 34

References

1. PostgreSQL Global Development Group.PostgreSQL Documentation (https://www.postgresql.org/docs/current/index.html).

2. Alleti, Rohan Reddy. Internals of MVCC in Postgres: Hidden Costs of Updates vs Inserts (https://medium.com/@rohanjnr44/internals-of-mvcc-in-postgres-hidden-costs-of-updates-vs-inserts-381eadd35844)

3. Nasser, Hussein. PostgreSQL YouTube Playlist.(https://youtube.com/playlist?list=PLQnljOFTspQWGrOqslniFlRcwxyY94cjj&si=Qa5ouj08WAhZk8-t)

4. Wolever, David. Postgres, MVCC, and You (or, Why COUNT() is Slow) (https://www.youtube.com/watch?v=GtQueJe6xRQ).

5. Pavlo, Andy. (2023). The Part of PostgreSQL We Hate the Most. Carnegie Mellon Database Group Blog. (https://www.cs.cmu.edu/~pavlo/blog/2023/04/the-part-of-postgresql-we-hate-the-most.html)

34 of 34

Thank you!

Any Questions ?

📌 Connect With Me:

linkedin.com/in/nikhilhegde989

www.nikhilhegde.com