1 of 29

Let’s buiId Indexes

1

2 of 29

Example [Reminder]

2

CName_Index

PriceDate_Index

  1. Index contains search values + Block #: e.g., DB block number.
    • In general, “pointer” to where the record is stored (e.g., RAM page, DB block number or even machine + DB block)
    • Index is conceptually a table. In practice, implemented very efficiently (see how soon)

  • Can have multiple indexes to support multiple search keys

Block #

Block #

Block #

How?

3 of 29

IO Aware

For big files (bigger than RAM), we need efficient data structures that work with HDD/SSD IO systems

    • An IO aware algorithm! (and data structures)

4 of 29

How to build?

  1. How is data organized?
    • Is data in Row or Column store?
    • Is data sorted or not?

  • How do we organize search values?

4

5 of 29

Recall

Data Layout

5

Company(CName, StockPrice, Date, Country)

Row1

Row3

Row5

Row8

Col3

Page

Row1

Row2

Row N

Row N +1

. . . . .

Row 2N

. . . . .

Page

Page … n (RAM/DIsk)

Page

CName

Page

Date

Page

Page

Price

Company

Row based storage

(aka Row Store)

Column based storage

(aka Column Store)

Logical Table

6 of 29

Index on row store

6

Query: Search for cname with specific price?

⇒ If ‘price’ is an indexed column, query will be fast.

⇒ ‘Price’ is search key(SKey). Values in price column are search values .

(e.g. ‘price’ == 19?)

Page

Row1

Row2

Row N

Row N +1

. . . . .

Row 2N

. . . . .

Page

Page … n (RAM/DIsk)

“Real” data layout, with full records

(including cname, prices, etc.)

Index for Price

where is 19?

where is 33?

(e.g, disk block 0x45AB)

Note: search key does not mean it’s unique. It’s what are you searching for.

(vs Primary KEYs in SQL that are unique)

7 of 29

Our 1st Hashing index

7

21

22

27

28

30

33

35

37

19

11

Option2:

Sorted

19

22

27

28

30

33

35

37

11

21

If sorted, can do better. Maintain locations only of smallest SValue in each block. (e.g., 11, 27, 33)

Index for Price

How it works in practice?

1. Schema designer picks a column to keep data sorted by (e.g., price). Index for that column is cheap.

2. For other columns, index will be bigger (e.g., CName)

where is 19?

where is 33?

Hash

Function

SKey = <19>?

0x7c6D

0x7c6D

Goal of Index: where is location of each SValue

19

For simplicity, we’ll only show the SValues for Price index

Example row: <’goog’, price=19, date=Oct 1, ...> as <price=19> or <19>

Option1:

Unsorted

If unsorted, maintain locations of all SValues in each block.

8 of 29

Index on row store

Query: Search for cname with specific price?

Page

Row1

Row2

Row N

Row N +1

. . . . .

Row 2N

. . . . .

Page

Page … n (RAM/DIsk)

“Real” data layout, with full records

(including cname, prices, etc.)

Index for Price

where is 19?

where is 33?

(e.g, block 0x45AB)

How do we store Index?

⇒ Idea: Index is just a table (rows/columns). Same ideas

  • Store in pages
  • Persist on disk
  • Page into RAM buffer

If Index fits in RAM?

  • Lookups are fast

If Index does not fit in RAM?

  • Could page at random
  • Can we organize index pages better? (e.g. Index the index)

9 of 29

Index Types

9

  • Hash Tables
    • IO-aware hashing (e.g., linear or extendible hashing)

  • B-Trees (covered next)
    • Very good for range queries, sorted data
    • Some old databases only implemented B-Trees
    • We will look at a variant called B+ Trees

These data structures are “IO aware”

Real difference between structures:

costs of ops determines which index you pick and why

10 of 29

B+ Trees

11 of 29

Idea in B+ Trees

Search trees that are IO aware

    • Store each tree-node in 1 page
    • Balanced, height adjusted tree
    • Make leaves into a linked list (for range queries)

12 of 29

What you will learn about in this section

  • B+ Trees: Basics

  • B+ Trees: Design & Cost

  • Clustered Indexes

13 of 29

B+ Tree Exact Search

80

20

60

100

120

140

10

15

18

20

30

40

50

60

65

80

85

90

SKey = 30?

Row1

Row2

Row N

Row N +1

. . . . .

Row 2N

. . . . .

“Real” data layout, with full records

(including cname, prices, etc.)

Note: the pointers at the leaf level will be to the actual data records (rows).

We truncate and only display SValues for simplicity (as before)…

Data!

Index!

14 of 29

B+ Tree Exact Search

80

20

60

100

120

140

10

15

18

20

30

40

50

60

65

80

85

90

10

12

15

20

28

30

40

60

63

80

84

89

SKey = 30?

30 < 80

30 in [20,60)

Data! [simplified]

Not all nodes pictured

30 in [30,40)

15 of 29

Searching a B+ Tree

  • For exact SKey values:
    • Start at the root
    • Proceed down, to the leaf

  • For range queries:
    • As above
    • Then sequential traversal

SELECT cname

FROM Company

WHERE price = 25

SELECT cname

FROM Company

WHERE 20 <= price

AND price <= 30

16 of 29

B+ Tree Range Search

80

20

60

100

120

140

10

15

18

20

30

40

50

60

65

80

85

90

10

12

15

20

28

30

40

59

63

80

84

89

SKey in [30,85]?

30 < 80

30 in [20,60)

To the data!

Not all nodes pictured

30 in [30,40)

17 of 29

What you will learn about in this section

  • B+ Trees: Basics

  • B+ Trees: Design & Cost
    • How many search values per page?
    • How many levels in tree?

  • Clustered Indexes

18 of 29

B+ Tree Basics -- Root and non-leaf nodes

10

20

30

k < 10

 

 

 

21

22

27

28

30

33

35

37

15

11

Parameter f = fanout

(e.g. f = 3)

Each non-leaf node has

<= f SKeys

22

25

28

19 of 29

B+ Tree Basics -- Leaf nodes

10

20

30

22

25

28

29

Leaf nodes

32

34

37

38

Non-leaf or internal node

12

17

Leaf nodes

SKey slots contain pointers to data records

21

22

27

28

30

33

35

37

15

11

Pointer to the next leaf node, for faster sequential traversal

20 of 29

Costs of B+ trees

21 of 29

B+ Tree: High Fanout = Lower IO

  • As compared to e.g. binary search trees, B+ Trees have high fanout

  • Hence the depth of the tree is small → getting to any element requires very few IO operations!
    • Also can often store most/all of B+ Tree in RAM!

The fanout is defined as the number of pointers to child nodes coming out of a [leaf] node

Note that fanout is dynamic- we’ll often assume it’s constant just to come up with approximate eqns!

22 of 29

Cost Model for Indexes -- [Baseline simplest model]

Page

Row1

Row2

Row ...

Row...

. . . . .

Row ...

. . . . .

Page

Page … n (RAM/DIsk)

“Real” data layout, with full records

(including cname, prices, etc.)

E.g, where is 19?

where is 33?

(e.g, disk block 0x45AB)

Question: What’s physical layout? What are costs?

Let us build an index for an SKey (e.g., CName) in data

    • NumSKeys = number of SValues for that SKey (e.g., 5000 cnames)
    • SKeySize = size of SKey (e.g. 4 bytes)
    • PointerSize = size of pointer (e.g. 8 bytes)

For B+ tree

    • f = fanout (we’ll assume it is constant for simplicity…)
    • h = height of tree (e.g., 1, 2., …)
    • N = number of index pages

Simplified cost model

N index pages (leaf)

M data

pages

‘f’ fanout

<value, record location>

23 of 29

Cost Model for Indexes -- [Baseline simplest model]

Page

Row1

Row2

Row ..

Row ...

. . . . .

Page

Page … n (RAM/DIsk)

“Real” data layout, with full records

(including cname, prices, etc.)

E.g, where is 19?

where is 33?

(e.g, disk block 0x45AB)

N index pages (leaf)

M

data pages

‘f’ fanout

<value, record location>

Example 1:Search Amazon’s Product Catalog

    • SKeySize = 8 bytes,
    • PointerSize = 8 bytes
    • NumSKeys = 1 Trillion

AMAZING: Worst-case for 1 Trillion SKeys

  • 2 IOs (for 64MB) for index
  • 1 IO for data page

PageSize

64KB

64MB

f

~4000

~4 million

h

~4

~2

24 of 29

Simple Cost Model for Range Search

  • To do range search, we just follow the horizontal pointers

  • The IO cost is that of loading additional leaf nodes we need to access + the IO cost of loading each page of the results- we phrase this as “Cost(OUT)”

25 of 29

Fast Insertions & Self-Balancing

  • We won’t go into specifics of B+ Tree insertion algorithm, but has several attractive qualities:

    • ~ Same cost as exact search

    • Self-balancing: B+ Tree remains balanced (with respect to height) even after insert

B+ Trees also (relatively) fast for single insertions!

However, can become bottleneck if many insertions (if fill-factor slack is used up…)

26 of 29

What you will learn about in this section

  • B+ Trees: Basics

  • B+ Trees: Design & Cost

  • Clustered Indexes

27 of 29

Clustered vs. Unclustered Index

30

22

25

28

29

32

34

37

38

19

22

27

28

30

33

35

37

30

22

25

28

29

32

34

37

38

19

22

27

28

30

33

35

37

Clustered

Unclustered

Index Entries

Data Records

(sorted data)

An index is clustered if the underlying data is ordered in the same way as the index’s data entries.

28 of 29

Clustered vs. Unclustered Index

  • Recall that sequential disk block IO is much faster than random IO

  • For exact search, no difference between clustered / unclustered

  • For range search over R values: difference between
    • [a] 1 random IO + R sequential IO and [b] R random IO:
      • A random IO costs ~ 10ms (sequential much much faster)
      • For R = 100,000 records- difference between ~10ms and ~17min!

29 of 29

Summary

  • We covered an algorithm + some optimizations for sorting larger-than-memory files efficiently
    • An IO aware algorithm!

  • We create indexes over tables in order to support fast (exact and range) search and insertion over multiple search keys

  • B+ Trees are one index data structure which support very fast exact and range search & insertion via high fanout
    • Clustered vs. unclustered makes a big difference for range queries too