1 of 29

Let’s buiId Indexes

1

2 of 29

Example [Reminder]

2

CName_Index

PriceDate_Index

Index contains search values + Block #: e.g., DB block number.

In general, “pointer” to where the record is stored (e.g., RAM page, DB block number or even machine + DB block)
Index is conceptually a table. In practice, implemented very efficiently (see how soon)

Can have multiple indexes to support multiple search keys

Block #

How?

3 of 29

IO Aware

For big files (bigger than RAM), we need efficient data structures that work with HDD/SSD IO systems

An IO aware algorithm! (and data structures)

4 of 29

How to build?

How is data organized?

Is data in Row or Column store?
Is data sorted or not?

How do we organize search values?

4

5 of 29

Recall

Data Layout

5

Company(CName, StockPrice, Date, Country)

Row1

Row3

Row5

Row8

Col3

Page

Row1

Row2

Row N

Row N +1

. . . . .

Row 2N

. . . . .

Page

Page … n (RAM/DIsk)

Page

CName

Page

Date

Page

Price

Company

Row based storage

(aka Row Store)

Column based storage

(aka Column Store)

Logical Table

6 of 29

Index on row store

6

Query: Search for cname with specific price?

⇒ If ‘price’ is an indexed column, query will be fast.

⇒ ‘Price’ is search key(SKey). Values in price column are search values .

(e.g. ‘price’ == 19?)

Page

Row1

Row2

Row N

Row N +1

. . . . .

Row 2N

. . . . .

Page

Page … n (RAM/DIsk)

“Real” data layout, with full records

(including cname, prices, etc.)

Index for Price

where is 19?

where is 33?

(e.g, disk block 0x45AB)

Note: search key does not mean it’s unique. It’s what are you searching for.

(vs Primary KEYs in SQL that are unique)

7 of 29

Our 1st Hashing index

7

21

22

27

28

30

33

35

37

19

11

Option2:

Sorted

19

22

27

28

30

33

35

37

11

21

If sorted, can do better. Maintain locations only of smallest SValue in each block. (e.g., 11, 27, 33)

Index for Price

How it works in practice?

1. Schema designer picks a column to keep data sorted by (e.g., price). Index for that column is cheap.

2. For other columns, index will be bigger (e.g., CName)

where is 19?

where is 33?

Hash

Function

SKey = <19>?

0x7c6D

Goal of Index: where is location of each SValue

19

For simplicity, we’ll only show the SValues for Price index

Example row: <’goog’, price=19, date=Oct 1, ...> as <price=19> or <19>

Option1:

Unsorted

If unsorted, maintain locations of all SValues in each block.

8 of 29

Index on row store

Query: Search for cname with specific price?

Page

Row1

Row2

Row N

Row N +1

. . . . .

Row 2N

. . . . .

Page

Page … n (RAM/DIsk)

“Real” data layout, with full records

(including cname, prices, etc.)

Index for Price

where is 19?

where is 33?

(e.g, block 0x45AB)

How do we store Index?

⇒ Idea: Index is just a table (rows/columns). Same ideas

Store in pages
Persist on disk
Page into RAM buffer

If Index fits in RAM?

Lookups are fast

If Index does not fit in RAM?

Could page at random
Can we organize index pages better? (e.g. Index the index)

9 of 29

Index Types

9

Hash Tables

IO-aware hashing (e.g., linear or extendible hashing)

B-Trees (covered next)

Very good for range queries, sorted data
Some old databases only implemented B-Trees
We will look at a variant called B+ Trees

These data structures are “IO aware”

Real difference between structures:

costs of ops determines which index you pick and why

10 of 29

B+ Trees

11 of 29

Idea in B+ Trees

Search trees that are IO aware

Store each tree-node in 1 page
Balanced, height adjusted tree
Make leaves into a linked list (for range queries)

12 of 29

What you will learn about in this section

B+ Trees: Basics

B+ Trees: Design & Cost

Clustered Indexes

13 of 29

B+ Tree Exact Search

80

20		60

100		120		140

10		15		18

20		30		40		50

60		65

80		85		90

SKey = 30?

Row1

Row2

Row N

Row N +1

. . . . .

Row 2N

. . . . .

“Real” data layout, with full records

(including cname, prices, etc.)

Note: the pointers at the leaf level will be to the actual data records (rows).

We truncate and only display SValues for simplicity (as before)…

Data!

Index!

14 of 29

B+ Tree Exact Search

80

20		60

100		120		140

10		15		18

20		30		40		50

60		65

80		85		90

10

12

15

20

28

30

40

60

63

80

84

89

SKey = 30?

30 < 80

30 in [20,60)

Data! [simplified]

Not all nodes pictured

30 in [30,40)

15 of 29

Searching a B+ Tree

For exact SKey values:

Start at the root
Proceed down, to the leaf

For range queries:

As above
Then sequential traversal

SELECT cname

FROM Company

WHERE price = 25

SELECT cname

FROM Company

WHERE 20 <= price

AND price <= 30

16 of 29

B+ Tree Range Search

80

20		60

100		120		140

10		15		18

20		30		40		50

60		65

80		85		90

10

12

15

20

28

30

40

59

63

80

84

89

SKey in [30,85]?

30 < 80

30 in [20,60)

To the data!

Not all nodes pictured

30 in [30,40)

17 of 29

What you will learn about in this section

B+ Trees: Basics

B+ Trees: Design & Cost

How many search values per page?
How many levels in tree?

Clustered Indexes

18 of 29

B+ Tree Basics -- Root and non-leaf nodes

10		20		30

k < 10

21

22

27

28

30

33

35

37

15

11

Parameter f = fanout

(e.g. f = 3)

Each non-leaf node has

<= f SKeys

22		25		28

19 of 29

B+ Tree Basics -- Leaf nodes

10		20		30

22		25		28		29

Leaf nodes

32		34		37		38

Non-leaf or internal node

12		17

Leaf nodes

SKey slots contain pointers to data records

21

22

27

28

30

33

35

37

15

11

Pointer to the next leaf node, for faster sequential traversal

20 of 29

Costs of B+ trees

21 of 29

B+ Tree: High Fanout = Lower IO

As compared to e.g. binary search trees, B+ Trees have high fanout

Hence the depth of the tree is small → getting to any element requires very few IO operations!

Also can often store most/all of B+ Tree in RAM!

The fanout is defined as the number of pointers to child nodes coming out of a [leaf] node

Note that fanout is dynamic- we’ll often assume it’s constant just to come up with approximate eqns!

22 of 29

Cost Model for Indexes -- [Baseline simplest model]

Page

Row1

Row2

Row ...

Row...

. . . . .

Row ...

. . . . .

Page

Page … n (RAM/DIsk)

“Real” data layout, with full records

(including cname, prices, etc.)

E.g, where is 19?

where is 33?

(e.g, disk block 0x45AB)

Question: What’s physical layout? What are costs?

Let us build an index for an SKey (e.g., CName) in data

NumSKeys = number of SValues for that SKey (e.g., 5000 cnames)
SKeySize = size of SKey (e.g. 4 bytes)
PointerSize = size of pointer (e.g. 8 bytes)

For B+ tree

f = fanout (we’ll assume it is constant for simplicity…)
h = height of tree (e.g., 1, 2., …)
N = number of index pages

Simplified cost model

N index pages (leaf)

M data

pages

‘f’ fanout

<value, record location>

23 of 29

Cost Model for Indexes -- [Baseline simplest model]

Page

Row1

Row2

Row ..

Row ...

. . . . .

Page

Page … n (RAM/DIsk)

“Real” data layout, with full records

(including cname, prices, etc.)

E.g, where is 19?

where is 33?

(e.g, disk block 0x45AB)

N index pages (leaf)

M

data pages

‘f’ fanout

<value, record location>

Example 1:Search Amazon’s Product Catalog

SKeySize = 8 bytes,
PointerSize = 8 bytes
NumSKeys = 1 Trillion

AMAZING: Worst-case for 1 Trillion SKeys

2 IOs (for 64MB) for index
1 IO for data page

PageSize	64KB	64MB
f	~4000	~4 million
h	~4	~2

24 of 29

Simple Cost Model for Range Search

To do range search, we just follow the horizontal pointers

The IO cost is that of loading additional leaf nodes we need to access + the IO cost of loading each page of the results- we phrase this as “Cost(OUT)”

25 of 29

Fast Insertions & Self-Balancing

We won’t go into specifics of B+ Tree insertion algorithm, but has several attractive qualities:

~ Same cost as exact search

Self-balancing: B+ Tree remains balanced (with respect to height) even after insert

B+ Trees also (relatively) fast for single insertions!

However, can become bottleneck if many insertions (if fill-factor slack is used up…)

26 of 29

What you will learn about in this section

B+ Trees: Basics

B+ Trees: Design & Cost

Clustered Indexes

27 of 29

Clustered vs. Unclustered Index

30

22		25		28		29

32		34		37		38

19

22

27

28

30

33

35

37

30

22		25		28		29

32		34		37		38

19

22

27

28

30

33

35

37

Clustered

Unclustered

Index Entries

Data Records

(sorted data)

An index is clustered if the underlying data is ordered in the same way as the index’s data entries.

28 of 29

Clustered vs. Unclustered Index

Recall that sequential disk block IO is much faster than random IO

For exact search, no difference between clustered / unclustered

For range search over R values: difference between

[a] 1 random IO + R sequential IO and [b] R random IO:

A random IO costs ~ 10ms (sequential much much faster)
For R = 100,000 records- difference between ~10ms and ~17min!

29 of 29

Summary

We covered an algorithm + some optimizations for sorting larger-than-memory files efficiently

An IO aware algorithm!

We create indexes over tables in order to support fast (exact and range) search and insertion over multiple search keys

B+ Trees are one index data structure which support very fast exact and range search & insertion via high fanout

Clustered vs. unclustered makes a big difference for range queries too