1 of 135

Discussion 4

Buffer Management & Spatial Indexes

2 of 135

Announcements

Project 3 (Joins and QO) has been released and pt1 is due 10/10 at 11:59 PM.

Vitamin 4 (Buffer Management) is due Monday 9/30 at 11:59 PM

Midterm 1 - Tuesday 10/1

1 pfe double-sided cheat sheet and a calculator is permitted

3 of 135

Review for Disk and Memory

4 of 135

Disk vs Memory

  • Disk: hard drive storage (1 TB), with latency around 13ms - SLOW & Lot of Space
  • Memory (which contains Buffers): RAM (Random Access Memory) limited space (16GB), with latency around 83 nanoseconds - FAST & Limited Space

  • Any computations (read OR write) must be performed in MEMORY
  • Use the DISK to store data pages we don’t immediately need
  • Must load a page from disk to memory to read it and must write it back from memory to disk if we make modifications

Any action moving a page of data from disk to memory (reading from that page) or from memory to disk (writing to that page) will incur 1 I/O

5 of 135

Buffer Management

6 of 135

DBMS Architecture

7 of 135

Buffer Management Abstraction

8 of 135

Motivation

  • Before this section, we assumed that memory was infinite�
    • But it’s not!�
    • We need Buffer Manager to manage which page gets loaded into memory and evicted from memory

9 of 135

Buffer Manager

  • Layer that manages which pages are loaded in memory�
  • Controls when pages are read from & written to disk�
    • When no space in memory, decides what page to evict
    • Decision process is the page replacement policy
      • Big impact on I/Os depending on access pattern

10 of 135

Page Replacement Policies

11 of 135

LRU (Least Recently Used)

  • Evict the least recently used page (idea: pages we haven’t used in a long time probably won’t be used soon)
    • On use: pin frame (pinned frames cannot be evicted)
    • Track time each frame last unpinned (at end of use)
    • Replace the frame which least recently unpinned
  • Works well for repeated accesses to popular pages (temporal locality)
  • Can be costly: need to maintain heap data structure

12 of 135

LRU: Repeated Scan of File

A

B

C

D

File

Empty

Frame

Empty

Frame

Empty

Frame

Buffer

Scan

Cache Hits:

Attempts:

13 of 135

LRU: Repeated Scan of File

A

B

C

D

File

Empty

Frame

Empty

Frame

Empty

Frame

Buffer

Scan

Cache Hits:

Attempts:

A

0

1

Scan

14 of 135

LRU: Repeated Scan of File

A

B

C

D

File

Empty

Frame

Empty

Frame

Empty

Frame

Buffer

Scan

Cache Hits:

Attempts:

A

0

2

B

Scan

15 of 135

LRU: Repeated Scan of File

A

B

C

D

File

Empty

Frame

Empty

Frame

Empty

Frame

Buffer

Scan

Cache Hits:

Attempts:

A

0

3

B

C

Scan

16 of 135

LRU: Repeated Scan of File

A

B

C

D

File

Empty

Frame

Empty

Frame

Empty

Frame

Buffer

Scan

Cache Hits:

Attempts:

0

3

B

C

LRU

D

D

Scan

A

17 of 135

LRU: Repeated Scan of File

A

B

C

D

File

Empty

Frame

Empty

Frame

Empty

Frame

Buffer

Scan

Cache Hits:

Attempts:

0

4

C

LRU

Scan

D

A

A

B

18 of 135

Clock

  • Efficient approximation of LRU
  • Arrange frames in a circle (like numbers on a clock)
  • Advance clock arm around the clock to find pages to evict
    • Only do this if you need to bring new page into buffer
  • To make this approximate least recently used (rather than least recently loaded): add a reference bit to each frame
    • Set to 1 on load/hit, 0 if clock arm passes the frame
    • Evict unpinned frame if clock arm reaches it and bit = 0
    • (bit = 0 means less recently used than those with bit = 1)

19 of 135

Clock Example

Empty Frame

Empty Frame

Empty Frame

Empty Frame

Empty Frame

Empty Frame

20 of 135

Clock Example

Want to insert page:

Current frame has�pin-count > 0 �→ Skip

Empty Frame

Empty Frame

Empty Frame

Empty Frame

Empty Frame

Empty Frame

A

B

C

D

E

F

Pinned

Pinned

G

21 of 135

Clock Example

Want to insert page:

Current frame has�pin-count > 0 �→ Skip

Empty Frame

Empty Frame

Empty Frame

Empty Frame

Empty Frame

Empty Frame

A

B

C

D

E

F

Pinned

Pinned

G

22 of 135

Clock Example

Want to insert page:

Not Pinned and �Ref. bit is set →

  1. Clear Ref. Bit
  2. Skip

Empty Frame

Empty Frame

Empty Frame

Empty Frame

Empty Frame

Empty Frame

A

B

C

D

E

F

Pinned

Pinned

G

23 of 135

Clock Example

Want to insert page:

Not pinned and�Ref. bit unset:

  1. Evict
  2. Copy Page
  3. Set pinned
  4. Set ref. bit
  5. Advanced clock

Empty Frame

Empty Frame

Empty Frame

Empty Frame

Empty Frame

Empty Frame

A

B

C

G

E

F

Pinned

Pinned

G

D

24 of 135

Clock Example

Request for Page C:

Cache Hit!:

  1. Pin (inc.)
  2. Set ref bit.

Empty Frame

Empty Frame

Empty Frame

Empty Frame

Empty Frame

Empty Frame

A

B

C

G

E

F

Pinned

Pinned

Pinned

Pinned

C

25 of 135

LRU/Clock

  • Common policy
  • Good for repeated access to popular pages (when the access pattern has temporal locality)
  • LRU is costly, but Clock is fast (and a good approximation)
  • When do LRU/Clock perform poorly?
    • Consider repeated sequential scans of large files...

26 of 135

LRU: Repeated Scan of File

A

B

C

D

File

Empty

Frame

Empty

Frame

Empty

Frame

Buffer

Scan

Cache Hits:

Attempts:

27 of 135

LRU: Repeated Scan of File

A

B

C

D

File

Empty

Frame

Empty

Frame

Empty

Frame

Buffer

Scan

Cache Hits:

Attempts:

A

0

1

Scan

28 of 135

LRU: Repeated Scan of File

A

B

C

D

File

Empty

Frame

Empty

Frame

Empty

Frame

Buffer

Scan

Cache Hits:

Attempts:

A

0

2

B

Scan

29 of 135

LRU: Repeated Scan of File

A

B

C

D

File

Empty

Frame

Empty

Frame

Empty

Frame

Buffer

Scan

Cache Hits:

Attempts:

A

0

3

B

C

Scan

30 of 135

LRU: Repeated Scan of File

A

B

C

D

File

Empty

Frame

Empty

Frame

Empty

Frame

Buffer

Scan

Cache Hits:

Attempts:

0

3

B

C

LRU

D

D

Scan

A

31 of 135

LRU: Repeated Scan of File

A

B

C

D

File

Empty

Frame

Empty

Frame

Empty

Frame

Buffer

Scan

Cache Hits:

Attempts:

0

4

C

LRU

Scan

D

A

A

B

32 of 135

LRU: Repeated Scan of File

A

B

C

D

File

Empty

Frame

Empty

Frame

Empty

Frame

Buffer

Scan

Cache Hits:

Attempts:

0

5

A

LRU

Scan

D

B

B

C

33 of 135

LRU: Repeated Scan of File

A

B

C

D

File

Empty

Frame

Empty

Frame

Empty

Frame

Buffer

Scan

Cache Hits:

Attempts:

0

6

A

LRU

Scan

D

B

No Cache Hits!

Sequential Flooding

34 of 135

MRU (Most Recently Used)

  • Evict the most recently used page (idea: pages we just used probably won’t be needed again, such as in a scan)
    • Almost identical to LRU, but evict the most recently used page instead of the least recently used page
  • Performs much better on repeated scans...

35 of 135

MRU: Repeated Scan of File

A

B

C

D

File

Empty

Frame

Empty

Frame

Empty

Frame

Buffer

Scan

Cache Hits:

Attempts:

0

0

Scan

36 of 135

MRU: Repeated Scan of File

A

B

C

D

File

Empty

Frame

Empty

Frame

Empty

Frame

Buffer

Scan

Cache Hits:

Attempts:

0

1

Scan

A

MRU

37 of 135

MRU: Repeated Scan of File

A

B

C

D

File

Empty

Frame

Empty

Frame

Empty

Frame

Buffer

Scan

Cache Hits:

Attempts:

0

2

Scan

A

B

MRU

38 of 135

MRU: Repeated Scan of File

A

B

C

D

File

Empty

Frame

Empty

Frame

Empty

Frame

Buffer

Scan

Cache Hits:

Attempts:

0

3

Scan

A

B

C

MRU

39 of 135

MRU: Repeated Scan of File

A

B

C

D

File

Empty

Frame

Empty

Frame

Empty

Frame

Buffer

Scan

Cache Hits:

Attempts:

0

4

Scan

A

B

D

MRU

D

D

40 of 135

MRU: Repeated Scan of File

A

B

C

D

File

Empty

Frame

Empty

Frame

Empty

Frame

Buffer

Scan

Cache Hits:

Attempts:

1

5

Scan

A

B

D

MRU

Hit!

41 of 135

MRU: Repeated Scan of File

A

B

C

D

File

Empty

Frame

Empty

Frame

Empty

Frame

Buffer

Scan

Cache Hits:

Attempts:

2

6

Scan

A

B

D

MRU

Hit!

42 of 135

MRU: Repeated Scan of File

A

B

C

D

File

Empty

Frame

Empty

Frame

Empty

Frame

Buffer

Scan

Cache Hits:

Attempts:

2

Scan

A

C

D

MRU

7

C

C

43 of 135

MRU: Repeated Scan of File

A

B

C

D

File

Empty

Frame

Empty

Frame

Empty

Frame

Buffer

Scan

Cache Hits:

Attempts:

3

8

Scan

A

C

D

MRU

Hit!

Improved

Cache

Hit Rate!

44 of 135

Summary of Replacement Policies

  • LRU is best when access pattern has temporal locality where data is reused within a small duration of time
  • Clock is fast and a good approximation for the costly LRU policy
  • MRU is best for sequential scans

45 of 135

Worksheet

46 of 135

Note on access patterns

  • For any access patterns like A B C D A F A D G D G E D F, each page is immediately pinned and then unpinned before the next access.
  • Example:
    • We read in the first page, A, and this page is pinned while it is in use.
    • However, when we move onto page B, we no longer actively use page A anymore and we unpin it before reading in page B.

  • We will explicitly specify if a page remains pinned, i.e. in 1b, A, B, C, D, A, F (remains pinned), D, G, D, unpin F, G, E, D, F.

47 of 135

Worksheet: LRU

  • Fill in the following tables for the given buffer replacement policies. You have 4 buffer pages, with the access pattern

A B C D A F A D G D G E D F

48 of 135

Worksheet: LRU

  • Fill in the following tables for the given buffer replacement policies. You have 4 buffer pages, with the access pattern

A B C D A F A D G D G E D F

49 of 135

Worksheet: MRU

  • Fill in the following tables for the given buffer replacement policies. You have 4 buffer pages, with the access pattern

A B C D A F A D G D G E D F

50 of 135

Worksheet: MRU

  • Fill in the following tables for the given buffer replacement policies. You have 4 buffer pages, with the access pattern

A B C D A F A D G D G E D F

51 of 135

Worksheet: Clock

  • Fill in the following tables for the given buffer replacement policies. You have 4 buffer pages, with the access pattern

A B C D A F A D G D G E D F

52 of 135

A B C D A F A D G D G E D F

* For this problem, none of the pages remained pinned. The animation for pinning/unpinning each page is not shown.

Empty Frame

Empty Frame

0

Empty Frame

0

Empty Frame

0

0

53 of 135

A B C D A F A D G D G E D F

1

A

Empty Frame

0

Empty Frame

0

Empty Frame

0

54 of 135

A B C D A F A D G D G E D F

1

A

Empty Frame

0

Empty Frame

0

Empty Frame

0

55 of 135

A B C D A F A D G D G E D F

1

A

Empty Frame

0

Empty Frame

0

B

1

56 of 135

A B C D A F A D G D G E D F

1

A

Empty Frame

0

Empty Frame

0

B

1

57 of 135

A B C D A F A D G D G E D F

1

A

Empty Frame

Empty Frame

0

B

1

C

1

58 of 135

A B C D A F A D G D G E D F

1

A

Empty Frame

Empty Frame

0

B

1

C

1

59 of 135

A B C D A F A D G D G E D F

1

A

Empty Frame

B

1

C

1

1

D

60 of 135

A B C D A F A D G D G E D F

1

A

Empty Frame

B

1

C

1

1

D

61 of 135

A B C D A F A D G D G E D F

Hit

Set ref bit = 1

Do not move clock hand

1

A

Empty Frame

B

1

C

1

1

D

62 of 135

A B C D A F A D G D G E D F

Current frame’s ref bit = 1 →

Set ref bit to 0

Advance clock hand

A

Empty Frame

B

1

C

1

1

D

0

63 of 135

A B C D A F A D G D G E D F

Current frame’s ref bit = 1 →

Set ref bit to 0

Advance clock hand

A

Empty Frame

B

C

1

1

D

0

0

64 of 135

A B C D A F A D G D G E D F

Current frame’s ref bit = 1 →

Set ref bit to 0

Advance clock hand

A

Empty Frame

B

C

0

1

D

0

0

65 of 135

A B C D A F A D G D G E D F

Current frame’s ref bit = 1 →

Set ref bit to 0

Advance clock hand

A

Empty Frame

B

C

0

0

D

0

0

66 of 135

A B C D A F A D G D G E D F

Current frame’s ref bit = 0 →

Evict existing page

Read in new page

Set ref bit = 1

Advance clock hand

F

Empty Frame

B

C

0

0

D

1

0

67 of 135

A B C D A F A D G D G E D F

Current frame’s ref bit = 0 →

Evict existing page

Read in new page

Set ref bit = 1

Advance clock hand

F

Empty Frame

B

C

0

0

D

1

0

68 of 135

A B C D A F A D G D G E D F

Current frame’s ref bit = 0 →

Evict existing page

Read in new page

Set ref bit = 1

Advance clock hand

F

Empty Frame

A

C

0

0

D

1

1

69 of 135

A B C D A F A D G D G E D F

Current frame’s ref bit = 0 →

Evict existing page

Read in new page

Set ref bit = 1

Advance clock hand

F

Empty Frame

A

C

0

0

D

1

1

70 of 135

A B C D A F A D G D G E D F

Hit

Set ref bit = 1

Do not move clock hand

F

Empty Frame

A

C

0

D

1

1

1

71 of 135

A B C D A F A D G D G E D F

Current frame’s ref bit = 0 →

Evict existing page

Read in new page

Set ref bit = 1

Advance clock hand

F

Empty Frame

A

G

D

1

1

1

1

72 of 135

A B C D A F A D G D G E D F

Current frame’s ref bit = 0 →

Evict existing page

Read in new page

Set ref bit = 1

Advance clock hand

F

Empty Frame

A

G

D

1

1

1

1

73 of 135

A B C D A F A D G D G E D F

Hit

Set ref bit = 1

Do not move clock hand

F

Empty Frame

A

G

D

1

1

1

1

74 of 135

A B C D A F A D G D G E D F

Hit

Set ref bit = 1

Do not move clock hand

F

Empty Frame

A

G

D

1

1

1

1

75 of 135

A B C D A F A D G D G E D F

Current frame’s ref bit = 1 →

Set ref bit to 0

Advance clock hand

F

Empty Frame

A

G

D

1

1

0

1

76 of 135

A B C D A F A D G D G E D F

Current frame’s ref bit = 1 →

Set ref bit to 0

Advance clock hand

F

Empty Frame

A

G

D

1

1

0

0

77 of 135

A B C D A F A D G D G E D F

Current frame’s ref bit = 1 →

Set ref bit to 0

Advance clock hand

F

Empty Frame

A

G

D

1

0

0

0

78 of 135

A B C D A F A D G D G E D F

Current frame’s ref bit = 1 →

Set ref bit to 0

Advance clock hand

F

Empty Frame

A

G

D

0

0

0

0

79 of 135

A B C D A F A D G D G E D F

Current frame’s ref bit = 0 →

Evict existing page

Read in new page

Set ref bit = 1

Advance clock hand

F

Empty Frame

A

G

E

0

0

0

1

80 of 135

A B C D A F A D G D G E D F

Current frame’s ref bit = 0 →

Evict existing page

Read in new page

Set ref bit = 1

Advance clock hand

F

Empty Frame

A

G

E

0

0

0

1

81 of 135

A B C D A F A D G D G E D F

Current frame’s ref bit = 0 →

Evict existing page

Read in new page

Set ref bit = 1

Advance clock hand

D

Empty Frame

A

G

E

0

0

1

1

82 of 135

A B C D A F A D G D G E D F

Current frame’s ref bit = 0 →

Evict existing page

Read in new page

Set ref bit = 1

Advance clock hand

D

Empty Frame

A

G

E

0

0

1

1

83 of 135

A B C D A F A D G D G E D F

Current frame’s ref bit = 0 →

Evict existing page

Read in new page

Set ref bit = 1

Advance clock hand

D

Empty Frame

F

G

E

0

1

1

1

84 of 135

A B C D A F A D G D G E D F (Hit rate = 4/14)

Current frame’s ref bit = 0 →

Evict existing page

Read in new page

Set ref bit = 1

Advance clock hand

D

Empty Frame

F

G

E

0

1

1

1

85 of 135

Another visualization for Clock Policy

86 of 135

Red frame outline is where the clock hand points at the end of the access

Red box means the second chance bit is zero

Green box means the second chance bit is one

* means a hit

87 of 135

Red frame outline is where the clock hand points at the end of the access

Red box means the second chance bit is zero

Green box means the second chance bit is one

* means a hit

88 of 135

Red frame outline is where the clock hand points at the end of the access

Red box means the second chance bit is zero

Green box means the second chance bit is one

* means a hit

89 of 135

Red frame outline is where the clock hand points at the end of the access

Red box means the second chance bit is zero

Green box means the second chance bit is one

* means a hit

90 of 135

Red frame outline is where the clock hand points at the end of the access

Red box means the second chance bit is zero

Green box means the second chance bit is one

* means a hit

91 of 135

Red frame outline is where the clock hand points at the end of the access

Red box means the second chance bit is zero

Green box means the second chance bit is one

* means a hit

92 of 135

Red frame outline is where the clock hand points at the end of the access

Red box means the second chance bit is zero

Green box means the second chance bit is one

* means a hit

93 of 135

Red frame outline is where the clock hand points at the end of the access

Red box means the second chance bit is zero

Green box means the second chance bit is one

* means a hit

94 of 135

Red frame outline is where the clock hand points at the end of the access

Red box means the second chance bit is zero

Green box means the second chance bit is one

* means a hit

95 of 135

Red frame outline is where the clock hand points at the end of the access

Red box means the second chance bit is zero

Green box means the second chance bit is one

* means a hit

96 of 135

Red frame outline is where the clock hand points at the end of the access

Red box means the second chance bit is zero

Green box means the second chance bit is one

* means a hit

97 of 135

Red frame outline is where the clock hand points at the end of the access

Red box means the second chance bit is zero

Green box means the second chance bit is one

* means a hit

98 of 135

Red frame outline is where the clock hand points at the end of the access

Red box means the second chance bit is zero

Green box means the second chance bit is one

* means a hit

99 of 135

Red frame outline is where the clock hand points at the end of the access

Red box means the second chance bit is zero

Green box means the second chance bit is one

* means a hit

100 of 135

Red frame outline is where the clock hand points at the end of the access

Red box means the second chance bit is zero

Green box means the second chance bit is one

* means a hit

101 of 135

Red frame outline is where the clock hand points at the end of the access

Red box means the second chance bit is zero

Green box means the second chance bit is one

* means a hit

102 of 135

Red frame outline is where the clock hand points at the end of the access

Red box means the second chance bit is zero

Green box means the second chance bit is one

* means a hit

103 of 135

Red frame outline is where the clock hand points at the end of the access

Red box means the second chance bit is zero

Green box means the second chance bit is one

* means a hit

104 of 135

Red frame outline is where the clock hand points at the end of the access

Red box means the second chance bit is zero

Green box means the second chance bit is one

* means a hit

105 of 135

Worksheet: LRU w/ Pinned Pages

  • Fill in the following tables for the given buffer replacement policies. You have 4 buffer pages, with the access pattern

A, B, C, D, A, F (remains pinned), D, G, D, unpin F, G, E, D, F

106 of 135

Worksheet: LRU w/ Pinned Pages

  • Fill in the following tables for the given buffer replacement policies. You have 4 buffer pages, with the access pattern

A, B, C, D, A, F (remains pinned), D, G, D, unpin F, G, E, D, F

A

*

E

6

13

B

F

x

x

x

x

x

*

C

G

*

D

*

*

*

* = hit Remember that unpinning doesn’t contribute to the hit count

X = pinned

107 of 135

Worksheet: MRU w/ Pinned Pages

  • Fill in the following tables for the given buffer replacement policies. You have 4 buffer pages, with the access pattern

A, B, C, D, A, F (remains pinned), D, G, D, unpin F, G, E, D, F

108 of 135

Worksheet: MRU w/ Pinned Pages

  • Fill in the following tables for the given buffer replacement policies. You have 4 buffer pages, with the access pattern

A, B, C, D, A, F (remains pinned), D, G, D, unpin F, G, E, D, F

A

*

F

x

x

x

x

x

G

E

3

13

B

C

D

*

G

D

*

F

* = hit Remember that unpinning doesn’t contribute to the hit count!

X = pinned

109 of 135

Worksheet 1 (continued)

(c) Is MRU ever better than LRU?

(d) Why would we use a clock replacement policy over LRU?

(e) Why would it be useful for a database management system to implement its own buffer replacement policy? Why shouldn’t we just rely on the operating system?

110 of 135

Worksheet 1 (continued)

(c) Is MRU ever better than LRU?

Yes, MRU prevents sequential flooding during sequential scans

(d) Why would we use a clock replacement policy over LRU?

Efficiency (approximation of LRU; don’t need to maintain entire ordering)

(e) Why would it be useful for a database management system to implement its own buffer replacement policy? Why shouldn’t we just rely on the operating system?

The database management system knows its data access patterns, which allows it to optimize its buffer replacement policy for each case

111 of 135

Spatial Indexes

112 of 135

Spatial Indexes

  • Data structures to perform queries over vectors
    • “Find the nearest point to my query point”
    • “Find all points within distance 10 from my query point”
  • Must be defined with respect to some metric
    • Euclidean distance
    • Cosine similarity
    • Manhattan distance
  • k-d trees are a popular spatial index

113 of 135

id

vec

1

[0, 10]

2

[10, 0]

3

[5, 5]

4

[6, 2]

5

[3, 8]

6

[2, 3]

7

[6, 9]

8

[3, 2]

Query: Find the closest vector to [9, 5] with respect to the manhattan distance

d([9, 5], [0, 10]) = |9 - 0| + |5 - 10| = 14

d([9, 5], [10, 0]) = |9 - 10| + |5 - 10| = 6

d([9, 5], [5, 5]) = |9 - 5| + |5 - 5| = 4

d([9, 5], [6, 2]) = |9 - 6| + |5 - 2| = 6

d([9, 5], [3, 8]) = |9 - 3| + |5 - 8| = 9

Closest

d([9, 5], [2, 3]) = |9 - 2| + |5 - 3| = 9

d([9, 5], [6, 9]) = |9 - 6| + |5 - 9| = 7

d([9, 3], [6, 2]) = |9 - 3| + |6 - 2| = 10

114 of 135

Using Indexes

  • In the previous example, we did a brute force search
    • Prohibitively slow for many real workloads requiring low latency
  • k-d trees partition the space so that only a subset of database vectors need to be compared to a given query vector
    • Similar to a binary search tree and B+ tree
  • k-d trees are typically used for lower dimensional data (e.g., up to 8 dimensions)

115 of 135

K-d Tree Construction

  • Choose a dimension and find the median (or mean) of all points along that dimension
  • Split points in half, separated by the median (or mean)
  • Recursively split both halves
    • Pick a different dimension in the next recursion step
      • Not necessarily required, but often done

116 of 135

Example

id

vec

1

[0, 10]

2

[10, 0]

3

[5, 5]

4

[6, 2]

5

[3, 8]

6

[2, 3]

7

[6, 9]

8

[3, 2]

117 of 135

Stored as a Tree

L

R

U

D

U

D

L

L

L

L

R

R

R

R

[0, 10]

[3, 8]

[2, 3]

[3, 2]

[5, 5]

[6, 9]

[6, 2]

[10, 0]

118 of 135

K-d Tree Nearest Neighbor Search

  • Start from the root of the tree
  • Traverse the tree to find the leaf node containing the query point
    • Mark the database vector corresponding to that leaf node as the current closest vector
    • Note: this is not necessarily the closest neighbor!!
  • Backtrack up the tree and check if there can be a closer point on the other side of the split
    • If there is a possibility of a closer vector, recursively search the other side
  • After backtracking past the root, the closest vector found is the answer

119 of 135

Example

id

vec

1

[0, 10]

2

[10, 0]

3

[5, 5]

4

[6, 2]

5

[3, 8]

6

[2, 3]

7

[6, 9]

8

[3, 2]

120 of 135

Example

id

vec

1

[0, 10]

2

[10, 0]

3

[5, 5]

4

[6, 2]

5

[3, 8]

6

[2, 3]

7

[6, 9]

8

[3, 2]

121 of 135

Example

id

vec

1

[0, 10]

2

[10, 0]

3

[5, 5]

4

[6, 2]

5

[3, 8]

6

[2, 3]

7

[6, 9]

8

[3, 2]

122 of 135

Example

7

id

vec

1

[0, 10]

2

[10, 0]

3

[5, 5]

4

[6, 2]

5

[3, 8]

6

[2, 3]

7

[6, 9]

8

[3, 2]

123 of 135

Backtrack

7

3.5

id

vec

1

[0, 10]

2

[10, 0]

3

[5, 5]

4

[6, 2]

5

[3, 8]

6

[2, 3]

7

[6, 9]

8

[3, 2]

124 of 135

Backtrack

4

id

vec

1

[0, 10]

2

[10, 0]

3

[5, 5]

4

[6, 2]

5

[3, 8]

6

[2, 3]

7

[6, 9]

8

[3, 2]

125 of 135

Backtrack

4

1.5

id

vec

1

[0, 10]

2

[10, 0]

3

[5, 5]

4

[6, 2]

5

[3, 8]

6

[2, 3]

7

[6, 9]

8

[3, 2]

126 of 135

Backtrack

4

id

vec

1

[0, 10]

2

[10, 0]

3

[5, 5]

4

[6, 2]

5

[3, 8]

6

[2, 3]

7

[6, 9]

8

[3, 2]

127 of 135

Backtrack

4

6

id

vec

1

[0, 10]

2

[10, 0]

3

[5, 5]

4

[6, 2]

5

[3, 8]

6

[2, 3]

7

[6, 9]

8

[3, 2]

128 of 135

Backtrack

4

6

id

vec

1

[0, 10]

2

[10, 0]

3

[5, 5]

4

[6, 2]

5

[3, 8]

6

[2, 3]

7

[6, 9]

8

[3, 2]

129 of 135

Backtrack

4

5

id

vec

1

[0, 10]

2

[10, 0]

3

[5, 5]

4

[6, 2]

5

[3, 8]

6

[2, 3]

7

[6, 9]

8

[3, 2]

130 of 135

Done

4

id

vec

1

[0, 10]

2

[10, 0]

3

[5, 5]

4

[6, 2]

5

[3, 8]

6

[2, 3]

7

[6, 9]

8

[3, 2]

131 of 135

Worksheet #2a

How many distance computations with database points are needed to find the nearest neighbor using brute force?

132 of 135

Worksheet #2a

How many distance computations with database points are needed to find the nearest neighbor using brute force?

8, since we need to compare our query point to each database point

133 of 135

Worksheet #2b

How many distance computations with database points are needed to find the nearest neighbor using our k-d tree?

134 of 135

Worksheet #2b

How many distance computations with database points are needed to find the nearest neighbor using our k-d tree?

2, since we need to check both the points towards the bottom right

135 of 135

Attendance Link