1 of 61

B-Trees (and 2-3 and 2-3-4 Trees)

1

Lecture 17 (Data Structures 3)

CS61B, Spring 2025 @ UC Berkeley

Slides credit: Josh Hug

3 5

1 2

6 7

4

2

6

4

1

3

5

7

2 of 61

BST Height, Big O vs. Worst Case Big Theta

Lecture 17, CS61B, Spring 2025

Binary Search Trees

  • BST Height, Big O vs. Worst Case Big Theta
  • Worst Case Performance

B-Trees

  • Splitting Juicy Nodes
  • Chain Reaction Splitting
  • B-Tree Terminology
  • Invariants
  • Worst Case Performance

3 of 61

Big Theta vs. Big O

Common student question: If we have big theta, why do we also have big O?

The answer is the same as: If we have =, why do we also have .

Let’s dig on this point, and also get a better feel for BSTs as we go.

4 of 61

BST Tree Height

Let’s start today by carefully discussing the height of binary search trees.

Trees range from best-case “bushy” to worst-case “spindly”.

  • Difference is dramatic!

e

b

g

a

d

f

j

v

p

y

m

r

x

z

k

v

y

z

k

H=3

H=3

5 of 61

Tree Height: http://yellkey.com/article

Height varies dramatically between “bushy” and “spindly” trees.

Let H(N) be the height of a tree with N nodes. Give H(N) in Big-Theta notation for “bushy” and “spindly” trees, respectively:

  1. Θ(log(N)), Θ(log(N))
  2. Θ(log(N)), Θ(N)
  3. Θ(N), Θ(log(N))
  4. Θ(N), Θ(N)

e

b

g

a

d

f

j

v

p

y

m

r

x

z

k

v

y

z

k

H=3

H=3

6 of 61

Tree Height

Height varies dramatically between “bushy” and “spindly” trees.

H = Θ(N)

H = Θ(log(N))

e

b

g

a

d

f

j

v

p

y

m

r

x

z

k

v

y

z

k

H=3

H=3

Performance of operations on spindly trees can be just as bad as a linked list!

  • Example: contains(“z”) would take linear time.

7 of 61

Statements about Tree Height: http://yellkey.com/network

Which of these statements are true?

  1. Worst case BST height is Θ(N).
  2. BST height is O(N).
  3. BST height is O(N2).

8 of 61

Statements about Tree Height

Which of these statements are true?

  • Worst case BST height is Θ(N). "Is equal to linear."
  • BST height is O(N). "Is less than or equal to linear."
  • BST height is O(N2). "Is less than or equal to quadratic."

All are true!

  • A worst case (spindly tree) has a height that grows exactly linearly - Θ(N).
  • All BSTs have a height that grows linearly or better - O(N).
  • All BSTs have a height that grows quadratically or better - O(N2).

9 of 61

Statements about Tree Height: http://yellkey.com/head

Which of these statements is more informative?

  • Worst case BST height is Θ(N).
  • BST height is O(N).
  • They are equally informative.

10 of 61

Statements about Tree Height

Which of these statements is more informative?

  • Worst case BST height is Θ(N).
  • BST height is O(N).
  • They are equally informative.

Saying that the worst case has order of growth N is more informative than saying the height is O(N).

�Let’s see an analogy.

11 of 61

Question: http://yellkey.com/compare

Which statement gives you more information about a hotel?

  1. The most expensive room in the hotel is $639 per night.
  2. Every room in the hotel is less than or equal to $639 per night.

12 of 61

Question

Which statement gives you more information about a hotel?

  • The most expensive room in the hotel is $639 per night.
  • Every room in the hotel is less than or equal to $639 per night.

Most expensive room: $639/nt

All rooms <= $639/nt

13 of 61

BST Height

BST height is all four of these:

  • O(N).
  • Θ(log N) in the best case (“bushy”).
  • Θ(N) in the worst case (“spindly”).
  • O(N2).

The middle two statements are more informative.

  • Big O is NOT mathematically the same thing as “worst case”.
    • e.g. BST heights are O(N2), but are not quadratic in the worst case.
    • … but Big O often used as shorthand for “worst case”.

14 of 61

The Usefulness of Big O

Big O is still a useful idea:

  • Allows us to make simple blanket statements, e.g. can just say “binary search is O(log N)” instead of “binary search is Θ(log N) in the worst case”.
  • Sometimes don’t know the exact runtime, so use O to give an upper bound.
    • Example: Runtime for finding shortest route that goes to all world cities is O(2N)*. There might be a faster way, but nobody knows one yet.
  • Easier to write proofs for Big O than Big Theta, e.g. finding runtime of mergesort, you can round up the number of items to the next power of 2 (see A level study guide problems for Asymptotics2 lecture). A little beyond the scope of our course.

*: Under certain assumptions and constraints not listed.

15 of 61

BST Height

BST height is both of these:

  • Θ(log N) in the best case (“bushy”).
  • Θ(N) in the worst case (“spindly”).

Let’s now turn to understanding the performance of BST operations.

16 of 61

Worst Case Performance

Lecture 17, CS61B, Spring 2025

Binary Search Trees

  • BST Height, Big O vs. Worst Case Big Theta
  • Worst Case Performance

B-Trees

  • Splitting Juicy Nodes
  • Chain Reaction Splitting
  • B-Tree Terminology
  • Invariants
  • Worst Case Performance

17 of 61

Height and Depth

Height and average depth are important properties of BSTs.

  • The “depth” of a node is how far it is from the root, e.g. depth(g) = 2.

v

p

y

r

z

k

e

b

g

a

d

f

j

s

depth 0

depth 1

depth 2

depth 3

depth 4

height 4

  • The “height” of a tree is the depth of its deepest leaf, e.g. height(T) = 4.
  • The “average depth” of a tree is the average depth of a tree’s nodes.
    • (0x1 + 1x2 + 2x4 + 3x6 + 4x1)/(1+2+4+6+1) = 2.35

18 of 61

Height, Depth and Runtime

Height and average depth determine runtimes for BST operations.

  • The “height” of a tree determines the worst case runtime to find a node.
    • Example: Worst case is contains(s), requires 5 comparisons (height + 1).
  • The “average depth” determines the average case runtime to find a node.
    • Example: Average case is 3.35 comparisons (average depth + 1).

v

p

y

r

z

k

e

b

g

a

d

f

j

s

depth 0

depth 1

depth 2

depth 3

depth 4

height 4

19 of 61

BSTs in Practice

Suppose we want to build a BST out of the numbers 1, 2, 3, 4, 5, 6, 7.

Give an example of a sequence of add operations that results in:

  • A spindly tree.
  • A bushy tree.

4

2

6

1

3

5

7

4

2

6

1

3

5

7

20 of 61

BSTs in Practice

Give an example of a sequence of add operations that results in:

  • A spindly tree.
    • add(1), add(2), add(3), add(4), add(5), add(6), add(7)
  • A bushy tree.
    • add(4), add(2), add(1), add(3), add(6), add(5), add(7)

4

2

6

1

3

5

7

4

2

6

1

3

5

7

Height: 6

Average Depth: 3

Height: 2

Average Depth: 1.43

21 of 61

Important Question: What about Real World BSTs?

BSTs have:

  • Worst case Θ(N) height.
  • Best case Θ(log N) height.

… but what about trees that you’d build during real world applications?

One way to approximate this is to consider randomized BSTs.

22 of 61

Simulation: Trees Built from Random Inserts

Video courtesy of Kevin Wayne (Princeton University)

Nice Property. Random trees have Θ(log N) average depth and height.

  • In other words: Random trees are bushy, not spindly.

23 of 61

Randomized Trees: Mathematical Analysis

Average Depth. If N distinct keys are inserted into a BST, the expected average depth is ~ 2 ln N = Θ(log N).

  • Thus, average runtime for contains operation is Θ(log N) on a tree built with random inserts.
  • Will discuss this proof briefly closer to the end of this course.

Tree Height. If N distinct keys are inserted in random order, expected tree height is ~ 4.311 ln N (see Reed, 2003).

  • Thus, worst case runtime for contains operation is Θ(log N) on a tree built with random inserts.
  • Proof is well beyond the scope of the course (and is 27 pages long!).

Note: ~ is the same thing as Big Theta, but you don’t drop the multiplicative constants.

24 of 61

Important Question: What about Real World BSTs?

BSTs have:

  • Worst case Θ(N) height.
  • Best case Θ(log N) height.
  • Θ(log N) height if constructed via random inserts.

In real world applications we expect both insertion and deletion.

  • See (IMO really interesting!) extra slides for more on simulations of trees including deletion.
  • Can show that random trees including deletion are still Θ(log N) height (if you randomly pick between predecessor and successor when deleting).

25 of 61

Good News and Bad News

Good news: BSTs have great performance if we insert items randomly. Performance is Θ(log N) per operation.

Bad News: We can’t always insert our items in a random order. Why?

e

b

g

o

n

p

m

q

r

s

26 of 61

Good News and Bad News

Good news: BSTs have great performance if we insert items randomly. Performance is Θ(log N) per operation.

Bad News: We can’t always insert our items in a random order. Why?

  • Data comes in over time, don’t have all at once.
    • Example: Storing dates of events.
      • add(“01-Jan-2019, 10:31:00”)
      • add(“01-Jan-2019, 18:51:00”)
      • add(“02-Jan-2019, 00:05:00”)
      • add(“02-Jan-2019, 23:10:00”)

In this lecture, we’ll do something totally different.

e

b

g

o

n

p

m

q

r

s

27 of 61

Splitting Juicy Nodes

Lecture 17, CS61B, Spring 2025

Binary Search Trees

  • BST Height, Big O vs. Worst Case Big Theta
  • Worst Case Performance

B-Trees

  • Splitting Juicy Nodes
  • Chain Reaction Splitting
  • B-Tree Terminology
  • Invariants
  • Worst Case Performance

28 of 61

Avoiding Imbalance

The problem is adding new leaves at the bottom.

Crazy idea: Never add new leaves at the bottom.

  • Tree can never get imbalanced.

5

2

7

15

14

16

13

17

18

19

?

?

?

?

Q: What do we do with incoming keys?

29 of 61

Avoiding Imbalance through Overstuffing

The problem is adding new leaves at the bottom.

Avoid new leaves by “overstuffing” the leaf nodes.

  • “Overstuffed tree” always has balanced height, because leaf depths never change.
    • Height is just max(depth).

5

2

7

15

14

16

13

5

2

7

15

14

16 17

13

5

2

7

15

14

16 17 18

13

30 of 61

Avoiding Imbalance through Overstuffing

Overstuffed trees are a logically consistent but very weird data structure.

  • contains(18):
    • Is 18 > 13? Yes, go right.
    • Is 18 > 15? Yes, go right.
    • Is 16 = 18? No.
    • Is 17 = 18? No.
    • Is 18 = 18? Yes! Found it.

Q: What is the problem with this idea?

5

2

7

15

14

16 17 18 19

13

5

2

7

15

14

16 17 18 19 20 21 22 23 24

13

31 of 61

Revising Our Overstuffed Tree Approach

Height is balanced, but we have a new problem:

  • Leaf nodes can get too juicy.

Solution?

5

2

7

15

14

16 17 18 19

13

32 of 61

Revising Our Overstuffed Tree Approach: Moving Items Up

Height is balanced, but we have a new problem:

  • Leaf nodes can get too juicy.

Solution?

  • Set a limit L on the number of items, say L=3.
  • If any node has more than L items, give an item to parent.
    • Which one? Let’s say (arbitrarily) the left-middle.

Q: What’s the problem now?

  • 16 is to the right of 17.

5

2

7

15

14

16 17 18 19

13

5

2

7

15 17

14

16 18 19

13

move up

33 of 61

Revising Our Overstuffed Tree Approach

Height is balanced, but we have a new problem:

  • Leaf nodes can get too juicy.

Solution?

  • Set a limit L on the number of items, say L=3.
  • If any node has more than L items, give an item to parent.
    • Which one? Let’s say (arbitrarily) the left-middle.

Challenge for you:

  • How can we tweak this idea to make it work?�

5

2

7

15 17

14

16 18 19

13

5

2

7

15

14

16 17 18 19

13

34 of 61

Revising Our Overstuffed Tree Approach: Node Splitting

Height is balanced, but we have a new problem:

  • Leaf nodes can get too juicy.

Solution?

  • Set a limit L on the number of items, say L=3.
  • If any node has more than L items, give an item to parent.
    • Pulling item out of full node splits it into left and right.
    • Parent node now has three children!

5

2

7

15 17

14

18 19

13

16

5

2

7

15

14

16 17 18 19

13

35 of 61

Revising Our Overstuffed Tree Approach: Node Splitting

This is a logically consistent and not so weird data structure.

  • contains(18):
    • 18 > 13, so go right
    • 18 > 15, so compare vs. 17
    • 18 > 17, so go right

Examining a node costs us O(L) compares, but that’s OK since L is constant.

What if a non-leaf node gets too full? Can we split that?

  • Sure, we’ll do this in a few slides, but first...

5

2

7

15 17

14

18 19

13

16

36 of 61

add Understanding Check

  • Suppose we add 20, 21:
  • Q: If our cap is at most L=3 items per node, draw post-split tree:

5

2

7

15 17

13

5

2

7

15 17

13

14

18 19

16

14

18 19 20 21

16

37 of 61

add Understanding Check

  • Suppose we add 20, 21:
  • Q: If our cap is at most L=3 items per node, draw post-split tree:

5

2

7

15 17 19

13

14

18

16

20 21

5

2

7

15 17

13

5

2

7

15 17

13

14

18 19

16

14

18 19 20 21

16

38 of 61

Chain Reaction Splitting

Lecture 17, CS61B, Spring 2025

Binary Search Trees

  • BST Height, Big O vs. Worst Case Big Theta
  • Worst Case Performance

B-Trees

  • Splitting Juicy Nodes
  • Chain Reaction Splitting
  • B-Tree Terminology
  • Invariants
  • Worst Case Performance

39 of 61

add: Chain Reaction

Suppose we add 25, 26:

5

2

7

15 17 19

13

14

18

16

20 21

5

2

7

15 17 19

13

14

18

16

20 21 25 26

5

2

7

15 17 19 21

13

14

18

16

25 26

20

5

2

7

13 17

19 21

18

25 26

20

14

16

15

40 of 61

What Happens If The Root Is Too Full?

Challenge: Draw the tree after the root is split.

19 21

15

5

13 17

19 21 22 23

15

5

13 17

22 23

15

5

13 17 21

19

22 23 24 25

15

5

13 17 21

19

24 25

15

5

13 17 21 23

19

22

41 of 61

What Happens If The Root Is Too Full?

Challenge: Draw the tree after the root is split.

22 23 24 25

15

5

13 17 21

19

24 25

15

5

13 17 21 23

19

22

24 25

15

5

21 23

19

22

17

13

42 of 61

B-Tree Terminology

Lecture 17, CS61B, Spring 2025

Binary Search Trees

  • BST Height, Big O vs. Worst Case Big Theta
  • Worst Case Performance

B-Trees

  • Splitting Juicy Nodes
  • Chain Reaction Splitting
  • B-Tree Terminology
  • Invariants
  • Worst Case Performance

43 of 61

Perfect Balance

Observation: Splitting-trees have perfect balance.

  • If we split the root, every node gets pushed down by exactly one level.
  • If we split a leaf node or internal node, the height doesn’t change.

We will soon prove: All operations have guaranteed O(log N) time.

  • More details soon.

24 25

15

5

21 23

19

22

17

13

44 of 61

The Real Name for Splitting Trees is “B Trees”

Splitting tree is a better name, but I didn’t invent them, so we’re stuck with their real name: B-trees.

  • B-trees of order L=3 (like we used today) are also called a 2-3-4 tree or a 2-4 tree.
    • “2-3-4” refers to the number of children that a node can have, e.g. a 2-3-4 tree node may have 2, 3, or 4 children.
  • B-trees of order L=2 are also called a 2-3 tree.

The origin of "B-tree" has never been explained by the authors. As we shall see, "balanced," "broad," or "bushy" might apply. Others suggest that the "B" stands for Boeing. Because of his contributions, however, it seems appropriate to think of B-trees as "Bayer"-trees.

- Douglas Corner (The Ubiquitous B-Tree)

45 of 61

A note on Terminology

B-Trees are most popular in two specific contexts:

  • Small L (L=2 or L=3):
    • Used as a conceptually simple balanced search tree (as today).
  • L is very large (say thousands).
    • Used in practice for databases and filesystems (i.e. systems with very large records).

2-3-4 a.k.a. 2-4 Tree (L=3):�Max 3 items per node.

Max 4 non-null children per node.

s u w

r

y z

t

n

p

o

e

b

g

m q

v

2-3 Tree (L=2):�Max 2 items per node.

Max 3 non-null children per node.

s u

r

t

n

p

o

e

b

g

m q

v

46 of 61

Invariants

Lecture 17, CS61B, Spring 2025

Binary Search Trees

  • BST Height, Big O vs. Worst Case Big Theta
  • Worst Case Performance

B-Trees

  • Splitting Juicy Nodes
  • Chain Reaction Splitting
  • B-Tree Terminology
  • Invariants
  • Worst Case Performance

47 of 61

Exercise: Recorded Video Viewers Only

Add the numbers 1, 2, 3, 4, 5, 6, then 7 (in that order) into a regular BST.

Then try adding 1, 2, 3, 4, 5, 6, then 7 (in that order) into a 2-3 tree (L=2).

  • For L=2, pass the middle item up.

48 of 61

Exercise

Add the numbers 1, 2, 3, 4, 5, 6, then 7 (in that order) into a regular BST.

Resulting BST is spindly.

4

2

6

1

3

5

7

49 of 61

Exercise

Add 1, 2, 3, 4, 5, 6, 7 (in that order) into a 2-3 tree. For L=2, pass the middle item up.

1

Adding 1.

1 2

Adding 2.

Adding 3. Node is too full, so we need to split.

1 2 3

Before split

After split

1

3

2

50 of 61

Exercise

Add 1, 2, 3, 4, 5, 6, 7 (in that order) into a 2-3 tree. For L=2, pass the middle item up.

3 4

1

2

Adding 4.

Adding 5. Node is too full, so we need to split.

Before split

After split

3 4 5

1

2

2 4

1

5

3

After adding 3.

1

3

2

51 of 61

Exercise

Add 1, 2, 3, 4, 5, 6, 7 (in that order) into a 2-3 tree. For L=2, pass the middle item up.

After adding 5.

2 4

1

5

3

Adding 6.

Adding 7. Node is too full, so we need to split.

5 6

2 4

1

3

5 6 7

2 4

1

3

2

6

4

1

3

5

7

2 4 6

1

5

3

7

Before split

After one split

After two splits

52 of 61

Exercise

Add the numbers 1, 2, 3, 4, 5, 6, then 7 (in that order) into a regular BST.

Then try adding 1, 2, 3, 4, 5, 6, then 7 (in that order) into a 2-3 tree.

  • Interactive demo: https://tinyurl.com/balanceYD or this link.
  • In this demo “max-degree” means the maximum number of children, i.e. 3.

1

2

3

4

5

6

7

2

6

4

1

3

5

7

All leaves are at depth 2.

53 of 61

Exercise 2: Recorded Video Viewers Only

Find an order such that if you add the items 1, 2, 3, 4, 5, 6, and 7 in that order, the resulting 2-3 tree has height 1.

54 of 61

Exercise 2

Find an order such that if you add the items 1, 2, 3, 4, 5, 6, and 7 in that order, the resulting 2-3 tree has height 1.

  • One possible answer: 2, 3, 4, 5, 6, 1, 7

Not sure why? Make sure to see https://tinyurl.com/balanceYD.

No matter the insertion order you choose, resulting B-Tree is always bushy!

  • May vary in height a little bit, but overall guaranteed to be bushy.

3 5

1 2

6 7

4

All leaves are at depth 1.

55 of 61

B-Tree Invariants

Because of the way B-Trees are constructed, we get two nice invariants:

  • All leaves must be the same distance from the root.
  • A non-leaf node with k items must have exactly k+1 children.
  • Example: The tree given below is impossible.
    • Leaves ([1] and [5 6 7]) are a different distance from the source.
    • Non-leaf node [2 3] has two items but only only one child. Should have three children.

We have not proven these invariants rigorously, but try thinking them through.

2 3

5 6 7

4

1

56 of 61

B-Tree Invariants

Because of the way B-Trees are constructed, we get two nice invariants:

  • All leaves must be the same distance from the root.
  • A non-leaf node with k items must have exactly k+1 children.

These invariants guarantee that our trees will be bushy.

57 of 61

Worst Case Performance

Lecture 17, CS61B, Spring 2025

Binary Search Trees

  • BST Height, Big O vs. Worst Case Big Theta
  • Worst Case Performance

B-Trees

  • Splitting Juicy Nodes
  • Chain Reaction Splitting
  • B-Tree Terminology
  • Invariants
  • Worst Case Performance

58 of 61

Height of a B-Tree with Limit L

L: Max number of items per node.

Height: Between ~logL+1(N) and ~log2(N)

  • Largest possible height is all non-leaf nodes have 1 item.
  • Smallest possible height is all nodes have L items.
  • Overall height is therefore Θ(log N).

N: 26 items

L: 2 max per node

H: 2

Height grows with log3(N)

*

*

*

* *

*

*

*

N: 8 items

L: 2 max per node

H: 2

Height grows with log2(N)

* *

* *

* *

* *

* *

* *

* *

* *

* *

* *

* *

* *

* *

near

worstcase

best

case

59 of 61

Runtime for contains

Runtime for contains:

  • Worst case number of nodes to inspect: H + 1
  • Worst case number of items to inspect per node: L
  • Overall runtime: O(HL)

Since H = Θ(log N), overall runtime is O(L log N).

  • Since L is a constant, runtime is therefore O(log N).

60 of 61

Runtime for add

Runtime for add:

  • Worst case number of nodes to inspect: H + 1
  • Worst case number of items to inspect per node: L
  • Worst case number of split operations: H + 1
  • Overall runtime: O(HL)

Since H = Θ(log N), overall runtime is O(L log N).

  • Since L is a constant, runtime is therefore O(log N).

Bottom line: contains and add are both O(log N).

61 of 61

Summary

BSTs have best case height Θ(log N), and worst case height Θ(N).

  • Big O is not the same thing as worst case!

B-Trees are a modification of the binary search tree that avoids Θ(N) worst case.

  • Nodes may contain between 1 and L items.
  • contains works almost exactly like a normal BST.
  • add works by adding items to existing leaf nodes.
    • If nodes are too full, they split.
  • Resulting tree has perfect balance. Runtime for operations is O(log N).
  • Have not discussed deletion. See extra slides if you’re curious.
  • Have not discussed how splitting works if L > 3 (see some other class).
  • B-trees are more complex, but they can efficiently handle ANY insertion order.

Extra slides cover deletion. Not in scope for our class.