CSE 373 Wi24 Section 4
Hashing
Agenda
Announcements
Mon (1/22) | Tues (1/23) | Wed (1/24) | Thurs (1/25) | Fri (1/26) | Sat (1/27) |
| EX2 due @ 11:59 pm |
| | E2 late due @ 11:59 pm |
|
Mon (1/29) | Tues (1/30) | Wed (1/31) | Thurs (2/1) | Fri (2/2) | Sat (2/3) |
| EX3 due @ 11:59 pm | | P2 due @ 11:59 pm | Midterm!! @ 3:30 pm | P2 late due @ 11:59 pm |
Mid Quarter Check-in
[INSERT YOUR FORM’S URL HERE]
Hashing Intro
Hash Tables
One way to implement the dictionary ADT.
Stores item with key x in array index (or “bucket”) h(x).�(The choice of h(x)is important for many reasons.)��
Collisions?
One approach: Separate chaining uses Linked Lists to handle collisions.
0
1
3124
4583
20382827
553256591
553256592
snack
...
...
potato
rotato
totato
dog
cat
...
...
Using Mod (Compression)
We use the modulus of the hashcode when it exceeds the number of buckets.
0
1
2
3
4
5
6
7
8
9
Q: If we use the 10 buckets below, where should our six items go?�
A: Put in bucket = h(x) % 10
potato
rotato
totato
snack
0
1
3124
4583
20382827
553256591
553256592
snack
...
...
potato
rotato
totato
dog
cat
...
...
cat
dog
Hash Function Quality
Note: The more h(x)evenly distributes the keys, the faster get is! (Why?)
0
1
2
3
4
5
6
7
8
9
potato
rotato
totato
snack
0
1
3124
4583
20382827
553256591
553256592
snack
...
...
potato
rotato
totato
dog
cat
...
...
cat
dog
Problem 1A + 1B:��Hash Table Insertion
Problem 1A
Hash Table With Separate Chaining
Remember you use hash function with key not with value!
Suppose we have a hash table implemented using separate chaining. This hash table has an internal capacity of 10. Its buckets are implemented using a linked list where new elements are appended to the end. Do not worry about resizing.
Show what this hashtable internally looks like after inserting the following key-value pairs in the order given using the hash function h(x) = 4x:
(1, a), (4, b), (2, c), (17, d), (12, e), (9, e), (19, f), (4, g), (8, c), (12, f)
Hash Table With Separate Chaining
Input: (1,a)
Hash function: h(x) = 4x
h(1) = 4
4% 10 = 4
|
| | | (1,a) | | | | | |
0 1 2 3 4 5 6 7 8 9
Hash Table With Separate Chaining
Input: (4,b)
Hash function: h(x) = 4x
h(4) = 16
16 % 10 = 6
|
| | | (1,a) | | (4,b) | | | |
0 1 2 3 4 5 6 7 8 9
Hash Table With Separate Chaining
Input: (2,c)
Hash function: h(x) = 4x
h(2) = 8
8 % 10 = 8
|
| | | (1,a) | | (4,b) | | (2,c) | |
0 1 2 3 4 5 6 7 8 9
Hash Table With Separate Chaining
Input: (17,d)
Hash function: h(x) = 4x
h(17) = 68
68 % 10 = 8
|
| | | (1,a) | | (4,b) | | (2,c) ↓ (17,d) | |
0 1 2 3 4 5 6 7 8 9
Hash Table With Separate Chaining
Input: (12,e)
Hash function: h(x) = 4x
h(12) = 48
48 % 10 = 8
|
| | | (1,a) | | (4,b) | | (2,c) ↓ (17,d) ↓ (12,e) | |
0 1 2 3 4 5 6 7 8 9
Hash Table With Separate Chaining
Input: (9,e)
Hash function: h(x) = 4x
h(9) = 36
36 % 10 = 6
|
| | | (1,a) | | (4,b) ↓ (9,e) | | (2,c) ↓ (17,d) ↓ (12,e) | |
0 1 2 3 4 5 6 7 8 9
Hash Table With Separate Chaining
Input: (19,f)
Hash function: h(x) = 4x
h(19) = 76
76 % 10 = 6
|
| | | (1,a) | | (4,b) ↓ (9,e) ↓ (19,f) | | (2,c) ↓ (17,d) ↓ (12,e) | |
0 1 2 3 4 5 6 7 8 9
Hash Table With Separate Chaining
Input: (4,g)
Hash function: h(x) = 4x
h(4) = 16
16 % 10 = 6
|
| | | (1,a) | | (4,g) ↓ (9,e) ↓ (19,f) | | (2,c) ↓ (17,d) ↓ (12,e) | |
0 1 2 3 4 5 6 7 8 9
Since the key already exists, we update its value!
Hash Table With Separate Chaining
Input: (8,c)
Hash function: h(x) = 4x
h(8) = 32
32 % 10 = 2
|
| (8,c) | | (1,a) | | (4,g) ↓ (9,e) ↓ (19,f) | | (2,c) ↓ (17,d) ↓ (12,e) | |
0 1 2 3 4 5 6 7 8 9
Hash Table With Separate Chaining
Input: (12,f)
Hash function: h(x) = 4x
h(12) = 48
48 % 10 = 8
|
| (8,c) | | (1,a) | | (4,g) ↓ (9,e) ↓ (19,f) | | (2,c) ↓ (17,d) ↓ (12,f) | |
0 1 2 3 4 5 6 7 8 9
Problem 1B
Evaluating Hash Functions
Evaluating Hash Functions
h(x) = 4x
0 1 2 3 4 5 6 7 8 9 10 11
Issue with the hash function:
Keys will initially only be hashed to at most one of three spots
Collisions are very likely, decreasing performance!
0
6
93
1
7
94
2
8
95
Evaluating Hash Functions
h(x) = 4x
0 1 2 3 4 5 6 7 8 9 10 11
Issue with the resize:
12 13 14 15 16 17 18 19 20 21 22 23
We still only map to a fourth of the available indices!
Evaluating Hash Functions
Possible Fixes?
Prime numbers will almost always improve our hashing!
Problem 4A-C:��Hashing and Mutation
Hashing and Mutation
Hashing and Mutation
This will look up the bucket (1 + 2) mod 4 = 3.
In the bucket 3, IntList.of(1, 2) is equivalent to [1, 2], so "dog" which is the stored value is returned.
a)
Hashing and Mutation
This will look up the bucket (1 + 2) mod 4 = 3.
In the bucket 3, IntList.of(1, 2) is NOT equivalent to [1, 2, 3], so we cannot find the matched key.
Hence, return null.
b)
Hashing and Mutation
This will look up the bucket (1 + 2 + 3) mod 4 = 2.
Since the bucket 2 is empty, we definitely cannot find the matched key.
Hence, return null.
b)
Hashing and Mutation
We changed a key while it was inside the hashtable, and caused it to be in the wrong bucket.��Never do this.
c)
Thank you & see you next week!
Agenda
MicroTeach: AVL Trees
Review: BST
But, a problem...
Normal BSTs can become unbalanced.��So find is slow.
What exactly does balanced mean?
For every node, compare the heights of its two sub trees.
Balanced when the difference in height between sub trees is no greater than 1 for every node.
What exactly does balanced mean?
Note: Balanced tree with n items has height of O(log(n)).
Solution: AVL Trees
These special BSTS remain balanced no matter what order items are inserted in.��They do this by rebalancing themselves using rotations each time an item gets inserted.
Don’t forget to check out the AVL rotation walkthrough too!
Question 1A-C: Valid BST/AVL
Problem 1A
6
Valid BST?
7
43
2
Problem 1A
Valid BST?
No!
6
7
43
2
Problem 1A
Valid AVL?
6
7
43
2
Problem 1A
Valid AVL?
No!
(Must be BST to be AVL!)
6
7
43
2
Problem 1B
Valid BST?
6
7
43
42
59
63
11
21
Problem 1B
Valid BST?
Yes!
6
7
43
42
59
63
11
21
Problem 1B
Valid AVL?
6
7
43
42
59
63
11
21
Problem 1B
Valid AVL?
No!
42 violates
Balance condition.
6
7
43
42
59
63
11
21
Problem 1C
Valid BST?
6
7
43
21
59
63
11
42
Problem 1C
Valid BST?
Yes!
6
7
43
21
59
63
11
42
Problem 1C
Valid AVL?
6
7
43
21
59
63
11
42
Problem 1C
Valid AVL?��Yes!
6
7
43
21
59
63
11
42
Problem 5A-C: AVL Big-O
Big O
Write down a tight big-O for each of the following. Unless otherwise noted, give a bound in the worst case
(a) Insert and find in a BST.
Big O
Insert: Inserting biggest/smallest item at the bottom of the tree
Find: Finding Biggest/smallest item in the skewed binary tree
O(N)
Let’s think about worst case : Linked List like tree
O(N)
Big O
Write down a tight big-O for each of the following. Unless otherwise noted, give a bound in the worst case
(b) Insert and find in AVL Tree
Big O
Insert: Balanced tree height is log(N)
�
Find: Balanced tree height is log(N)
O(log(N))
O(log(N))
Let’s think about worst case: Insert/find at the bottom of our tree
Big O
Write down a tight big-O for each of the following. Unless otherwise noted, give a bound in the worst case
(c) Finding the minimum value in an AVL tree containing
N elements
Big O
Find: AVL tree is balanced
Finding value by starting from root
and traverse to the left most branch
O(log(N))
Let’s think about worst case:
Problem 6a-c): Analyzing Dictionary Implementations
Problem 6a)
a) What are the constraints on the data types you can store in an AVL Tree?
Keys needs to be comparable!�
Since AVL trees maintain their keys in sorted order, we have to be able to compare keys to decide whether to go left or right at each node.
Note that there are no restrictions on our value’s data type since AVL trees order data by keys, not values.
Problem 6b)
b) When is using an AVL tree preferred over a hash table?
One Example: Better worst case for get()! AVL trees guarantee O(logn) at all times. A hash table in the worst case, where every key is added to the same bucket, takes O(n) time.
*Caveat: In practice, you avoid this worst case scenario by
having a good hash function.
0
1
2
3
4
5
6
7
8
9
potato
rotato
totato
dog
cat
Problem 6c)
c) When is using a BST preferred over an AVL tree?
One reason: BSTs are waaaaaay easier to implement and debug!
MicroTeach: Tries
Trie (Pronounced like “try”)
A Data Structure that represents the dictionary or set ADT
In order for a Trie to implement one of those ADTs, it needs the following functionalities:
Adding to a Trie
s
add(“sap”):
s
a
s
a
p
Adding to a Trie
add(“sad”):
s
a
p
s
a
p
d
Adding to a Trie
add(“awls”):
s
a
p
d
a
s
a
p
d
a
w
s
a
p
d
a
w
l
s
a
p
d
a
w
l
s
Adding to a Trie
add(“a”):
s
a
p
d
a
w
l
s
s
a
p
d
a
w
l
s
Searching in Tries
Removing from a Trie: Lazy Deletion
remove(“awls”):
s
a
p
d
a
w
l
s
m
e
s
a
p
d
a
w
l
s
m
e
Removing from a Trie: Normal Deletion
remove(“awls”):
s
a
p
d
a
w
l
s
m
e
s
a
p
d
a
w
l
m
e
s
a
p
d
a
w
m
e
s
a
p
d
a
m
e
Problem 1): Trie to Delete 0’s and 1’s?
Trie to Delete 0’s and 1’s (TA)
0
0
0
0
0
0
0
1
1
1
1
1
1
1
Trie to Delete 0’s and 1’s (TA)
0
0
0
0
0
0
0
1
1
1
1
1
1
1
Trie to Delete 0’s and 1’s (TA)
0
0
0
0
0
0
0
1
1
1
1
1
1
1
Trie to Delete 0’s and 1’s (TA)
0
0
0
0
0
0
0
1
1
1
1
1
1
1
(b) After part a, if we deleted all binary numbers of length 3, how many nodes would we have to delete?
Trie to Delete 0’s and 1’s (TA)
0
0
0
0
0
0
0
1
1
1
1
1
1
1
(b) After part a, if we deleted all binary numbers of length 3, how many nodes would we have to delete? Ans: 12 (delete all nodes representing strings of length 2 and length 3)
Trie to Delete 0’s and 1’s (TA)
0
0
0
0
0
0
0
1
1
1
1
1
1
1
(b) After part a, if we deleted all binary numbers of length 3, how many nodes would we have to delete? Ans: 12 (delete all nodes representing strings of length 2 and length 3)
Trie to Delete 0’s and 1’s (TA)
0
0
0
1
1
1
(b) After part a, if we deleted all binary numbers of length 3, how many nodes would we have to delete? Ans: 12 (delete all nodes representing strings of length 2 and length 3)
Trie to Delete 0’s and 1’s (TA)
0
1
(b) After part a, if we deleted all binary numbers of length 3, how many nodes would we have to delete? Ans: 12 (delete all nodes representing strings of length 2 and length 3)
Problem 2a-c): Call Me Maybe
Problem 2a)
a) Suppose you want to transfer someone’s phone book to a data structure so that you can all all the phone numbers with a particular area code efficiently. What data structure would you use? How would you implement it? There are a few answers here
Problem 2a)
a) Suppose you want to transfer someone’s phone book to a data structure so that you can all all the phone numbers with a particular area code efficiently. What data structure would you use? How would you implement it? There are a few answers here
Problem 2b)
b) What is the time complexity of your solution
Problem 2b)
b) What is the (in-practice) time complexity of your solution
Problem 2c)
c) What is the space complexity?
Problem 2c)
c) What is the space complexity?
Problem 3): Let’s Trie to be Old School
Problem 3)
Problem 3)