1 of 46

MongoDB Caching Internals

2 of 46

Hey, I am Uddeshya 👋

SDE-2 @ GTF (GoTo Finance), OSS Enthusiast, Occasional database tinkerer, Manchester United Masochist

Twt - @uds5501

Github - @uds5501

Email - singhuddeshyaofficial@gmail.com

3 of 46

Agenda (Broadly)

  • Honorary mention of mongoDB
  • Introduction to wiredTiger.
  • Glimpse of WiredTiger architecture for a write transaction.
  • How does the cache fill up?
  • Deep diving in cache eviction strategy.

4 of 46

This talk is not… ❌

  • An endorsement of MongoDB
    • Choice of databases is very subjective, choose yours as per your use case.
  • An exhaustive walkthrough of MongoDB features.
  • A guide on mongoDB best practices.

5 of 46

MongoDB

Famous NoSQL database with highly flexible data modelling support

  • Supports Transactions.
  • Loosely supports ACID properties.
  • Supports sharding and fault tolerance.
  • Amazing database for side projects [personal opinion]

6 of 46

General databases?

7 of 46

WiredTiger Storage Engine

  • Default mongoDB on-disk storage engine since MongoDB 3.2
  • Document level concurrency.
  • Snapshot and Checkpoint durability.
  • Supports Journaling.
  • Supports on-disk compression algorithms.
  • Good ol' plug and play.

The Good

8 of 46

WiredTiger Storage Engine

  • Can't pin documents in cache.
  • No separation for reads and writes in cache.
  • It doesn't allocate cache on a per-database or per-collection level.

The Bad

9 of 46

Architecture Overview

10 of 46

Act - 1

Gateways to engine

11 of 46

APIs

12 of 46

Act - 2

The building blocks

13 of 46

Schema

  • Defines data format for storage.
  • Supports both row storage and column storage.
  • key,value pairs.
  • ranging from size 512 B - 4GB in form of raw byte arrays.

14 of 46

15 of 46

16 of 46

Cursors

  • Basic tool for in-memory manipulation of objects.
  • Iterate / Get / Set / Update.

17 of 46

Metadata

  • Tracks files, indexes, history files for a user's database.

18 of 46

dHandles

  • Generic data structures to point to the storage data structures like B Trees / LSM Trees / Bloom filters / Metadata pages etc.

19 of 46

Act - 3

In Memory Storage

20 of 46

B Trees

21 of 46

Act - 4

Writing something?

22 of 46

Transactions!

  • Backbone of document level concurrency.
  • A single thread supports a single transaction.
  • 3 levels of isolation:
    • Snapshot isolation
    • read committed isolation
    • read uncommitted isolation.

23 of 46

Snapshots

  • Copy of state at the start of a transaction.
  • Stores
    • Maximum transaction ID
      • t_id >= max is invisible.
    • Concurrent transaction IDs.
      • not committed, hence invisible.
    • Minimum Transaction IDs
      • committed, hence visible.

24 of 46

25 of 46

Cache

  • Holds copies of recently accessed and modified data.
  • Cache loads btree pages into memory as required.
  • The cache size is generally fixed.
    • 50% of VM memory.
    • can be configured on runtime.

26 of 46

Block Manager

  • Responsible for reading and writing data from the disk.

27 of 46

  • Page header
    • metadata of page it was created from.
  • Block header
    • size
    • checksum
    • flags
    • padding

28 of 46

Tracking the blocks

29 of 46

  • O(logN) average inserts, updates and deletes.
  • Similar performance to AVL trees but with simpler implementation.
  • Higher memory overhead.
  • 2 sets of skip list required at each checkpointing process.

Why use skiplists?

30 of 46

31 of 46

32 of 46

33 of 46

  • First fit
    • Used for root pages.
    • root pages are created for elements with no on-disk footprint.
  • Best fit
    • Used by default for other pages.
    • Helps reduce disk fragmentation

When to use what?

34 of 46

Enough of disk, back to memory.

Caching internals!

35 of 46

Cache struction tracks

  • Clean bytes
  • Dirty bytes
  • Update bytes
  • Flags
  • Eviction progress
  • Eviction slots
  • Eviction queues

36 of 46

Eviction dance

37 of 46

How are pages selected for eviction?

38 of 46

39 of 46

How many slots are available?

40 of 46

Where do I evict from?

41 of 46

Which page do I select to evict?

42 of 46

But what if cache grows too large?

43 of 46

Application threads pulling up.

  • Application threads start evicting pages when
    • overall cache is >= 95% filled.
    • cache is >= 20% filled with dirty bytes.
    • cache is >= 10% filled with update bytes.

44 of 46

45 of 46

Optimization conclusions

  • Block manager:
    • Skip lists usage
    • When to use first fit vs best fit.
  • Eviction process:
    • Early exits from cache walks when using eviction threads.
    • Keeping a small thread pool for eviction.
    • Not interrupting application threads' transaction duties.
    • Prioritizing cache eviction only when in dire state.

46 of 46

Arigato!