Lecture 22:
Hash Tables, Part 2
CS 136: Spring 2024
Katie Keith
Record on Zoom
📣 Announcements
📣 Announcements
📚Readings
CS 136: Learning Goals
Analysis
Tradeoffs of Time and Space
Advanced Programming
OOP, Java, JUnit, Git
Data Structures*
Use & Implement
*Later, we’ll clarify abstract data types versus data structures
Review: Python Dictionaries are Symbol Tables
gold_medals = {
"USA": 40,
"China": 40,
"Japan": 20,
"Saint Lucia": 1,
}
gold_medals["New Zealand"] = 10
print(gold_medals["USA"])
put operation
get operation (search!)
Key of a dictionary
Value of a dictionary
Preview: Python dictionaries are implemented with Hash Tables!
Analysis of Symbol Table’s Data Structures
Data Structure | put | get |
(Unordered) Linked List | O(n) | O(n) |
Two arrays (keys sorted, binary search) | O(n) | O(log n) |
Binary Search Tree | Ave = O(log n) Worst = O(n) | Ave = O(log n) Worst = O(n) |
Hash Table (with separate chaining) | | |
Hash table (with linear probing) | | |
Ideas will return with hash tables!
Hash Tables
A hash table is a data structure that stores key-value pairs, using a hash function to compute an index into an array.
1. Hash function: Maps a key to a hash code
Hash function properties
We want to create hash functions that obey the following properties:
Review: Modular hashing with primes
Example:
Key | Hash code = Key % 4 | Hash code = Key % 7 |
10 | 2 | 3 |
20 | 0 | 6 |
30 | 2 | 2 |
40 | 0 | 5 |
50 | 2 | 1 |
60 | 0 | 4 |
Using M=4 (not prime), hash codes are all 0 or 2 (not evenly distributed).
Using M=7 (prime) hash codes are evenly distributed.
Hash function:
Key % M where M is prime
Why? Prime numbers have no divisors other than 1 and themselves, reducing the chance that patterns in the keys align with %M.
Hash Tables
A hash table is a data structure that stores key-value pairs, using a hash function to compute an index into an array.
1. Hash function: Maps a key to a hash code
2. Collision resolution: How to deal with multiple keys that map to the same hash code?
Two approaches:
1. Separate chaining
2. Linear probing
Symbol tables: Time-space tradeoff
Less time
More time
More space
Less space
Sequential search on a linked list is O(N).
Space is N.
Array of size M > N
Search: Array look-up is O(1)
Hash table’s collision resolution
N= Number of key-value pairs
M = Size of array
🎯 Today’s Learning Objectives
Collision Resolution: Separate Chaining
For each of the M array indices, build a linked list of the key-value pairs whose keys hash to that index.
Collision Resolution: Separate Chaining
public class SeparateChainingHashST<Key, Value> implements SymbolTable<Key, Value>{
private final int M = 97; // initial size of array
private Node[] st = new Node[M]; // array of linked-list nodes
private int n; // number of key-value pairs
// Inner Node class
private class Node {
private Object key;
private Object val;
private Node next;
…
}
Less time
More time
More space
Less space
What happens with the time-space tradeoff if we make M too large (compared to N)? M too small?
M is the size of the array
N is the number of key-value pairs
💡Think-pair-share
Resizing in separate-chaining hash table
N/M called the “load factor”
✅
🎯 Today’s Learning Objectives
Collision Resolution: Linear Probing
Linear probing: when a new key collides, find a free index and put the key-value pair there.
Collision Resolution: Linear Probing
Example
Board work
Collision Resolution: Linear Probing: Example
Collision Resolution: Linear Probing
public class LinearProbingHashST<Key, Value> {
private static final int INIT_CAPACITY = 16;
private int n; // number of key-value pairs in the symbol table
private int m; // size of linear probing table
private Key[] keys; // the keys
private Value[] vals; // the values
public LinearProbingHashST(int capacity) {
m = capacity;
n = 0;
keys = (Key[]) new Object[m];
vals = (Value[]) new Object[m];
}
Resizing a linear-probing hash table
N/M called the “load factor”
✅
✅
🎯 Today’s Learning Objectives
Python dictionaries are implemented with hash tables and linear probing.
Brainstorm explanations for the graph below.
Graph source: Victor Skvortsov
Search time
(within put)
for one key (ns)
Number of key-value pairs in dictionary
Note: log scale
For this graph, keys chosen from a uniform random distribution.
💡Think-pair-share
Graph source: Victor Skvortsov
Search time for one key (ns)
Number of key-value pairs in dictionary
(1)Why non-constant time?
(2)Why zigzags?
Moments when the hash table resizes
1
2
Answer
💡Think-pair-share
Analysis
Assumption: hash function results in uniform distribution of hash values
Average time for search
Proofs are challenging (you might see them in an advanced algorithms course)
Linear probing is fast when array is not filled (next index typically free)
Linear probing is slow when array almost full and have to search through almost every index.
N= number of key-value pairs
M = Size of the array
Load factor = N/M
Chaining is linear growth rate (proportional to average size of one linked list)
Analysis of Symbol Table’s Data Structures
Data Structure | put | get |
(Unordered) Linked List | O(n) | O(n) |
Two arrays (keys sorted, binary search) | O(n) | O(log n) |
Binary Search Tree | Ave = O(log n) Worst = O(n) | Ave = O(log n) Worst = O(n) |
Hash Table (with separate chaining) | Ave = O(1)* Worst= O(n) | Ave = O(1)* Worst= O(n) |
Hash table (with linear probing) | Ave = O(1)* Worst= O(n) | Ave = O(1)* Worst= O(n) |
*Depends on uniform hashing
Classic time-space tradeoff
Java’s Built-In Hash Tables
import java.util.HashMap;
HashMap<String, Integer> ages = new HashMap<>();
ages.put("Alice", 20);
int age = ages.get("Alice");
System.out.println("Alice is " + age + " years old.");
Java 8: Recommended to avoid using java.util.HashTable
This is an old, legacy data structure whose codebase is locked to 1990s APIs
Initial capacity: 16�
Load factor: 0.75
Data Structure: Separate chaining (singly linked list), but once M>64, converts to red-black tree (self-balancing BST)
Source: Open JDK
✅
✅
✅
🎯 Today’s Learning Objectives