1 of 95

CSE 373 Wi24 Section 4

Hashing

2 of 95

Agenda

Announcements

Hashing Intro

3 of 95

Announcements

Mon (1/22)	Tues (1/23)	Wed (1/24)	Thurs (1/25)	Fri (1/26)	Sat (1/27)
	EX2 due @ 11:59 pm			E2 late due @ 11:59 pm
Mon (1/29)	Tues (1/30)	Wed (1/31)	Thurs (2/1)	Fri (2/2)	Sat (2/3)
	EX3 due @ 11:59 pm		P2 due @ 11:59 pm	Midterm!! @ 3:30 pm	P2 late due @ 11:59 pm

Ex3 released!! Due date pushed back to Tues Feb 6th
Ex2 late due date tomorrow
Midterm next Friday

Taken in-class during our normal lecture time
If you need any accommodation, please email us as soons as possible. cse373-instructors@cs.washington.edu

4 of 95

Mid Quarter Check-in

[INSERT YOUR FORM’S URL HERE]

5 of 95

Hashing Intro

6 of 95

Hash Tables

One way to implement the dictionary ADT.

Stores item with key x in array index (or “bucket”) h(x).�(The choice of h(x)is important for many reasons.)��

7 of 95

Collisions?

One approach: Separate chaining uses Linked Lists to handle collisions.

0

1

3124

4583

20382827

553256591

553256592

snack

...

potato

rotato

totato

dog

cat

...

8 of 95

Using Mod (Compression)

We use the modulus of the hashcode when it exceeds the number of buckets.

0

1

2

3

4

5

6

7

8

9

Q: If we use the 10 buckets below, where should our six items go?�

A: Put in bucket = h(x) % 10

potato

rotato

totato

snack

0

1

3124

4583

20382827

553256591

553256592

snack

...

potato

rotato

totato

dog

cat

...

cat

dog

9 of 95

Hash Function Quality

Note: The more h(x)evenly distributes the keys, the faster get is! (Why?)

0

1

2

3

4

5

6

7

8

9

potato

rotato

totato

snack

0

1

3124

4583

20382827

553256591

553256592

snack

...

potato

rotato

totato

dog

cat

...

cat

dog

10 of 95

Problem 1A + 1B:��Hash Table Insertion

11 of 95

Problem 1A

12 of 95

Hash Table With Separate Chaining

Remember you use hash function with key not with value!

Suppose we have a hash table implemented using separate chaining. This hash table has an internal capacity of 10. Its buckets are implemented using a linked list where new elements are appended to the end. Do not worry about resizing.

Show what this hashtable internally looks like after inserting the following key-value pairs in the order given using the hash function h(x) = 4x:

(1, a), (4, b), (2, c), (17, d), (12, e), (9, e), (19, f), (4, g), (8, c), (12, f)

13 of 95

Hash Table With Separate Chaining

Input: (1,a)

Hash function: h(x) = 4x

h(1) = 4

4% 10 = 4

				(1,a)

0 1 2 3 4 5 6 7 8 9

14 of 95

Hash Table With Separate Chaining

Input: (4,b)

Hash function: h(x) = 4x

h(4) = 16

16 % 10 = 6

				(1,a)		(4,b)

0 1 2 3 4 5 6 7 8 9

15 of 95

Hash Table With Separate Chaining

Input: (2,c)

Hash function: h(x) = 4x

h(2) = 8

8 % 10 = 8

				(1,a)		(4,b)		(2,c)

0 1 2 3 4 5 6 7 8 9

16 of 95

Hash Table With Separate Chaining

Input: (17,d)

Hash function: h(x) = 4x

h(17) = 68

68 % 10 = 8

				(1,a)		(4,b)		(2,c) ↓ (17,d)

0 1 2 3 4 5 6 7 8 9

17 of 95

Hash Table With Separate Chaining

Input: (12,e)

Hash function: h(x) = 4x

h(12) = 48

48 % 10 = 8

				(1,a)		(4,b)		(2,c) ↓ (17,d) ↓ (12,e)

0 1 2 3 4 5 6 7 8 9

18 of 95

Hash Table With Separate Chaining

Input: (9,e)

Hash function: h(x) = 4x

h(9) = 36

36 % 10 = 6

				(1,a)		(4,b) ↓ (9,e)		(2,c) ↓ (17,d) ↓ (12,e)

0 1 2 3 4 5 6 7 8 9

19 of 95

Hash Table With Separate Chaining

Input: (19,f)

Hash function: h(x) = 4x

h(19) = 76

76 % 10 = 6

				(1,a)		(4,b) ↓ (9,e) ↓ (19,f)		(2,c) ↓ (17,d) ↓ (12,e)

0 1 2 3 4 5 6 7 8 9

20 of 95

Hash Table With Separate Chaining

Input: (4,g)

Hash function: h(x) = 4x

h(4) = 16

16 % 10 = 6

				(1,a)		(4,g) ↓ (9,e) ↓ (19,f)		(2,c) ↓ (17,d) ↓ (12,e)

0 1 2 3 4 5 6 7 8 9

Since the key already exists, we update its value!

21 of 95

Hash Table With Separate Chaining

Input: (8,c)

Hash function: h(x) = 4x

h(8) = 32

32 % 10 = 2

		(8,c)		(1,a)		(4,g) ↓ (9,e) ↓ (19,f)		(2,c) ↓ (17,d) ↓ (12,e)

0 1 2 3 4 5 6 7 8 9

22 of 95

Hash Table With Separate Chaining

Input: (12,f)

Hash function: h(x) = 4x

h(12) = 48

48 % 10 = 8

		(8,c)		(1,a)		(4,g) ↓ (9,e) ↓ (19,f)		(2,c) ↓ (17,d) ↓ (12,f)

0 1 2 3 4 5 6 7 8 9

23 of 95

Problem 1B

24 of 95

Evaluating Hash Functions

25 of 95

Evaluating Hash Functions

h(x) = 4x

0 1 2 3 4 5 6 7 8 9 10 11

Issue with the hash function:

Keys will initially only be hashed to at most one of three spots

Collisions are very likely, decreasing performance!

0

6

93

1

7

94

2

8

95

26 of 95

Evaluating Hash Functions

h(x) = 4x

0 1 2 3 4 5 6 7 8 9 10 11

Issue with the resize:

12 13 14 15 16 17 18 19 20 21 22 23

We still only map to a fourth of the available indices!

27 of 95

Evaluating Hash Functions

Possible Fixes?

Pick a new hash function that’s relatively prime to 12 (e.g. h(x) = 5x)

Choose a different initial table capacity (11, 13)

Resize differently (such as choosing a prime that’s roughly double the size

Prime numbers will almost always improve our hashing!

28 of 95

Problem 4A-C:��Hashing and Mutation

29 of 95

Hashing and Mutation

30 of 95

Hashing and Mutation

This will look up the bucket (1 + 2) mod 4 = 3.

In the bucket 3, IntList.of(1, 2) is equivalent to [1, 2], so "dog" which is the stored value is returned.

a)

31 of 95

Hashing and Mutation

This will look up the bucket (1 + 2) mod 4 = 3.

In the bucket 3, IntList.of(1, 2) is NOT equivalent to [1, 2, 3], so we cannot find the matched key.

Hence, return null.

b)

32 of 95

Hashing and Mutation

This will look up the bucket (1 + 2 + 3) mod 4 = 2.

Since the bucket 2 is empty, we definitely cannot find the matched key.

Hence, return null.

b)

33 of 95

Hashing and Mutation

We changed a key while it was inside the hashtable, and caused it to be in the wrong bucket.��Never do this.

c)

34 of 95

Thank you & see you next week!

35 of 95

Agenda

Announcements

Mid Quarter Check-in

AVL Tree Review

Tries

36 of 95

MicroTeach: AVL Trees

37 of 95

Review: BST

38 of 95

But, a problem...

Normal BSTs can become unbalanced.��So find is slow.

39 of 95

What exactly does balanced mean?

For every node, compare the heights of its two sub trees.

Balanced when the difference in height between sub trees is no greater than 1 for every node.

40 of 95

What exactly does balanced mean?

Note: Balanced tree with n items has height of O(log(n)).

41 of 95

Solution: AVL Trees

These special BSTS remain balanced no matter what order items are inserted in.��They do this by rebalancing themselves using rotations each time an item gets inserted.

42 of 95

Don’t forget to check out the AVL rotation w alkthrough too!

43 of 95

Question 1A-C: Valid BST/AVL

44 of 95

Problem 1A

6

Valid BST?

7

43

2

45 of 95

Problem 1A

Valid BST?

No!

6

7

43

2

46 of 95

Problem 1A

Valid AVL?

6

7

43

2

47 of 95

Problem 1A

Valid AVL?

No!

(Must be BST to be AVL!)

6

7

43

2

48 of 95

Problem 1B

Valid BST?

6

7

43

42

59

63

11

21

49 of 95

Problem 1B

Valid BST?

Yes!

6

7

43

42

59

63

11

21

50 of 95

Problem 1B

Valid AVL?

6

7

43

42

59

63

11

21

51 of 95

Problem 1B

Valid AVL?

No!

42 violates

Balance condition.

6

7

43

42

59

63

11

21

52 of 95

Problem 1C

Valid BST?

6

7

43

21

59

63

11

42

53 of 95

Problem 1C

Valid BST?

Yes!

6

7

43

21

59

63

11

42

54 of 95

Problem 1C

Valid AVL?

6

7

43

21

59

63

11

42

55 of 95

Problem 1C

Valid AVL?��Yes!

6

7

43

21

59

63

11

42

56 of 95

Problem 5A-C: AVL Big-O

57 of 95

Big O

Write down a tight big-O for each of the following. Unless otherwise noted, give a bound in the worst case

(a) Insert and find in a BST.

� �

58 of 95

Big O

Insert: Inserting biggest/smallest item at the bottom of the tree

Find: Finding Biggest/smallest item in the skewed binary tree

� �

O(N)

Let’s think about worst case : Linked List like tree

O(N)

59 of 95

Big O

Write down a tight big-O for each of the following. Unless otherwise noted, give a bound in the worst case

� �

(b) Insert and find in AVL Tree

�

60 of 95

Big O

Insert: Balanced tree height is log(N)

�

Find: Balanced tree height is log(N)

O(log(N))

� �

O(log(N))

Let’s think about worst case: Insert/find at the bottom of our tree

61 of 95

Big O

Write down a tight big-O for each of the following. Unless otherwise noted, give a bound in the worst case

� �

(c) Finding the minimum value in an AVL tree containing

N elements

�

62 of 95

Big O

Find: AVL tree is balanced

Finding value by starting from root

and traverse to the left most branch

O(log(N))

� �

Let’s think about worst case:

63 of 95

Problem 6a-c): Analyzing Dictionary Implementations

64 of 95

Problem 6a)

a) What are the constraints on the data types you can store in an AVL Tree?

Keys needs to be comparable!�

Since AVL trees maintain their keys in sorted order, we have to be able to compare keys to decide whether to go left or right at each node.

Note that there are no restrictions on our value’s data type since AVL trees order data by keys, not values.

65 of 95

Problem 6b)

b) When is using an AVL tree preferred over a hash table?

One Example: Better worst case for get()! AVL trees guarantee O(logn) at all times. A hash table in the worst case, where every key is added to the same bucket, takes O(n) time.

*Caveat: In practice, you avoid this worst case scenario by

having a good hash function.

0

1

2

3

4

5

6

7

8

9

potato

rotato

totato

dog

cat

66 of 95

Problem 6c)

c) When is using a BST preferred over an AVL tree?

One reason: BSTs are waaaaaay easier to implement and debug!

67 of 95

MicroTeach: Tries

68 of 95

69 of 95

Trie (Pronounced like “try”)

A Data Structure that represents the dictionary or set ADT

In order for a Trie to implement one of those ADTs, it needs the following functionalities:

Adding an element
Finding an element
Removing an element

70 of 95

Adding to a Trie

s

add(“sap”):

s

a

s

a

p

71 of 95

Adding to a Trie

add(“sad”):

s

a

p

s

a

p

d

72 of 95

Adding to a Trie

add(“awls”):

s

a

p

d

a

s

a

p

d

a

w

s

a

p

d

a

w

l

s

a

p

d

a

w

l

s

73 of 95

Adding to a Trie

add(“a”):

s

a

p

d

a

w

l

s

a

p

d

a

w

l

s

74 of 95

Searching in Tries

The next operation or behavior of Set ADT’s is the contains() method. We achieve this by following a similar pattern, beginning at the root node and checking against the nodes at each depth to see if we arrive at a final node that is purple, or else we fall off the tree. For example, for checking if our trie contains the “sam” key, we would traverse down by following along with the characters in “sam”, and once we reached the end of our desired key, we’d check the node and see it’s highlighted, which tells us it’s a key in our trie set.

In the case of checking if our trie contains the “sa” key, we would traverse down until the ‘a’ node, and then when we checked the node we’d see it’s not highlighted, which tells us “sa” isn’t a key in our trie set.

If we checked if our trie contains the “saq” key, we would traverse down until the ‘a’ node, and then when we checked the children of the ‘a’ node, we wouldn’t find a node with the letter ‘q’, so we’d fall off the tree and we know “saq” isn’t a key in our trie set.

75 of 95

Removing from a Trie: Lazy Deletion

remove(“awls”):

s

a

p

d

a

w

l

s

m

e

s

a

p

d

a

w

l

s

m

e

76 of 95

Removing from a Trie: Normal Deletion

remove(“awls”):

s

a

p

d

a

w

l

s

m

e

s

a

p

d

a

w

l

m

e

s

a

p

d

a

w

m

e

s

a

p

d

a

m

e

77 of 95

Problem 1): Trie to Delete 0’s and 1’s?

78 of 95

Trie to Delete 0’s and 1’s (TA)

0

1

If we deleted all binary numbers of length 2, how many nodes would we have to delete?

79 of 95

Trie to Delete 0’s and 1’s (TA)

0

1

If we deleted all binary numbers of length 2, how many nodes would we have to delete? Ans – 0, since we still have pointers to the nodes for length-3 binary strings

80 of 95

Trie to Delete 0’s and 1’s (TA)

0

1

If we deleted all binary numbers of length 2, how many nodes would we have to delete? Ans – 0, since we still have pointers to the nodes for length-3 binary strings

81 of 95

Trie to Delete 0’s and 1’s (TA)

0

1

(b) After part a, if we deleted all binary numbers of length 3, how many nodes would we have to delete?

82 of 95

Trie to Delete 0’s and 1’s (TA)

0

1

(b) After part a, if we deleted all binary numbers of length 3, how many nodes would we have to delete? Ans: 12 (delete all nodes representing strings of length 2 and length 3)

83 of 95

Trie to Delete 0’s and 1’s (TA)

0

1

(b) After part a, if we deleted all binary numbers of length 3, how many nodes would we have to delete? Ans: 12 (delete all nodes representing strings of length 2 and length 3)

84 of 95

Trie to Delete 0’s and 1’s (TA)

0

1

(b) After part a, if we deleted all binary numbers of length 3, how many nodes would we have to delete? Ans: 12 (delete all nodes representing strings of length 2 and length 3)

85 of 95

Trie to Delete 0’s and 1’s (TA)

0

1

(b) After part a, if we deleted all binary numbers of length 3, how many nodes would we have to delete? Ans: 12 (delete all nodes representing strings of length 2 and length 3)

86 of 95

Problem 2a-c): Call Me Maybe

87 of 95

Problem 2a)

a) Suppose you want to transfer someone’s phone book to a data structure so that you can all all the phone numbers with a particular area code efficiently. What data structure would you use? How would you implement it? There are a few answers here

88 of 95

Problem 2a)

a) Suppose you want to transfer someone’s phone book to a data structure so that you can all all the phone numbers with a particular area code efficiently. What data structure would you use? How would you implement it? There are a few answers here

We could use a HashMap where the keys are area codes and values are a list of corresponding phone numbers. We would have to parse each phone number for its area code and identify all keys with that area code.

We could also use a Trie where we use phone numbers as our Trie “routes”. We would insert all phone numbers into our Trie and visit all children of the node corresponding to a particular area code.

89 of 95

Problem 2b)

b) What is the time complexity of your solution

90 of 95

Problem 2b)

b) What is the (in-practice) time complexity of your solution

HashMap:

θ(n) to insert all of our phone numbers
θ(e) to look up all of the phone numbers with a particular area code where e is the number of phone numbers with that area code

Trie:

θ(n) to insert all of our phone numbers
θ(e) to look up all of the phone numbers with a particular area code

91 of 95

Problem 2c)

c) What is the space complexity?

92 of 95

Problem 2c)

c) What is the space complexity?

HashMap:

O(n), where n is the number of phone numbers

Trie:

O(n), where n is the number of phone numbers
In theory, though, Tries will be more space-efficient as Tries will store near-duplicate phone numbers in less space than a HashMap

93 of 95

Problem 3): Let’s Trie to be Old School

94 of 95

Problem 3)

95 of 95

Problem 3)

We can use a Trie where the routes/branches are represented by the digits and the node’s values are a collection of words.

To populate the Trie, we would iterate through each word in the dictionary, convert the word into the corresponding sequence of number, and finally use that sequence as the key/route to traverse the Trie and add the word.