1 of 31

Lecture 22:

Hash Tables, Part 2

CS 136: Spring 2024

Katie Keith

2 of 31

Record on Zoom

3 of 31

  • This week and next week: Katie’s office hours in TBL 302A
  • Lab 6: 1-on-1 testing and feedback this week.
    • Please be present and on time to lab.
  • Lab 7 released. Tree data structure you will learn about and implement
  • Looking ahead:
    • Calendar
    • Final project specifications on Wednesday

📣 Announcements

4 of 31

📣 Announcements

5 of 31

📚Readings

  • Sedgewick and Wayne. Algorithms. Section 3.4.

6 of 31

CS 136: Learning Goals

Analysis

Tradeoffs of Time and Space

Advanced Programming

OOP, Java, JUnit, Git

Data Structures*

Use & Implement

*Later, we’ll clarify abstract data types versus data structures

7 of 31

Review: Python Dictionaries are Symbol Tables

gold_medals = {

"USA": 40,

"China": 40,

"Japan": 20,

"Saint Lucia": 1,

}

gold_medals["New Zealand"] = 10

print(gold_medals["USA"])

put operation

get operation (search!)

Key of a dictionary

Value of a dictionary

Preview: Python dictionaries are implemented with Hash Tables!

8 of 31

Analysis of Symbol Table’s Data Structures

Data Structure

put

get

(Unordered) Linked List

O(n)

O(n)

Two arrays (keys sorted, binary search)

O(n)

O(log n)

Binary Search Tree

Ave = O(log n)

Worst = O(n)

Ave = O(log n)

Worst = O(n)

Hash Table (with separate chaining)

Hash table (with linear probing)

Ideas will return with hash tables!

9 of 31

Hash Tables

A hash table is a data structure that stores key-value pairs, using a hash function to compute an index into an array.

1. Hash function: Maps a key to a hash code

10 of 31

Hash function properties

We want to create hash functions that obey the following properties:

  • Efficient: The function should be fast to compute.
  • Deterministic: A given input value must always generate the same hash value.
  • Uniform: Maps the expected inputs as evenly as possible over the output range.
  • Equivalent: Any two values that are considered equivalent (.equals() in Java) should produce the same hash value.

11 of 31

Review: Modular hashing with primes

Example:

Key

Hash code = Key % 4

Hash code = Key % 7

10

2

3

20

0

6

30

2

2

40

0

5

50

2

1

60

0

4

Using M=4 (not prime), hash codes are all 0 or 2 (not evenly distributed).

Using M=7 (prime) hash codes are evenly distributed.

Hash function:

Key % M where M is prime

Why? Prime numbers have no divisors other than 1 and themselves, reducing the chance that patterns in the keys align with %M.

12 of 31

Hash Tables

A hash table is a data structure that stores key-value pairs, using a hash function to compute an index into an array.

1. Hash function: Maps a key to a hash code

2. Collision resolution: How to deal with multiple keys that map to the same hash code?

Two approaches:

1. Separate chaining

2. Linear probing

13 of 31

Symbol tables: Time-space tradeoff

Less time

More time

More space

Less space

Sequential search on a linked list is O(N).

Space is N.

Array of size M > N

Search: Array look-up is O(1)

Hash table’s collision resolution

N= Number of key-value pairs

M = Size of array

14 of 31

  • Collision resolution: Separate chaining
  • Collision resolution: Linear probing
  • Analyzing and Using Hash Tables

🎯 Today’s Learning Objectives

15 of 31

Collision Resolution: Separate Chaining

For each of the M array indices, build a linked list of the key-value pairs whose keys hash to that index.

  • Hash function: Map key to integer between 0 and M-1
  • Put: Insert new key-value pair at the front of ith chain (if key is not already there)
  • Get: Linear search on only the ith chain

16 of 31

Collision Resolution: Separate Chaining

public class SeparateChainingHashST<Key, Value> implements SymbolTable<Key, Value>{

private final int M = 97; // initial size of array

private Node[] st = new Node[M]; // array of linked-list nodes

private int n; // number of key-value pairs

// Inner Node class

private class Node {

private Object key;

private Object val;

private Node next;

}

17 of 31

Less time

More time

More space

Less space

What happens with the time-space tradeoff if we make M too large (compared to N)? M too small?

M is the size of the array

N is the number of key-value pairs

💡Think-pair-share

18 of 31

Resizing in separate-chaining hash table

N/M called the “load factor”

19 of 31

  • Collision resolution: Separate chaining
  • Collision resolution: Linear probing
  • Analyzing and Using Hash Tables

🎯 Today’s Learning Objectives

20 of 31

Collision Resolution: Linear Probing

Linear probing: when a new key collides, find a free index and put the key-value pair there.

  • Hash function: Map key to integer between 0 and M-1
  • Put: Put at table index i if free; if not try i+1, i+2, etc.
  • Get: Search table index i; if occupied but no match, try i+1, i+2, et

21 of 31

Collision Resolution: Linear Probing

Example

Board work

22 of 31

Collision Resolution: Linear Probing: Example

23 of 31

Collision Resolution: Linear Probing

public class LinearProbingHashST<Key, Value> {

private static final int INIT_CAPACITY = 16;

private int n; // number of key-value pairs in the symbol table

private int m; // size of linear probing table

private Key[] keys; // the keys

private Value[] vals; // the values

public LinearProbingHashST(int capacity) {

m = capacity;

n = 0;

keys = (Key[]) new Object[m];

vals = (Value[]) new Object[m];

}

24 of 31

Resizing a linear-probing hash table

N/M called the “load factor”

25 of 31

  • Collision resolution: Separate chaining
  • Collision resolution: linear probing
  • Analyzing and using Hash Tables

🎯 Today’s Learning Objectives

26 of 31

Python dictionaries are implemented with hash tables and linear probing.

Brainstorm explanations for the graph below.

Graph source: Victor Skvortsov

Search time

(within put)

for one key (ns)

Number of key-value pairs in dictionary

Note: log scale

For this graph, keys chosen from a uniform random distribution.

💡Think-pair-share

27 of 31

Graph source: Victor Skvortsov

Search time for one key (ns)

Number of key-value pairs in dictionary

(1)Why non-constant time?

  • Hash collisions are not the reason (keys chosen from random uniform distribution)
  • When hash table is small, fits completely into cache
  • As hash table grows larger, CPU has to access main memory more frequently

(2)Why zigzags?

Moments when the hash table resizes

1

2

Answer

💡Think-pair-share

28 of 31

Analysis

Assumption: hash function results in uniform distribution of hash values

Average time for search

Proofs are challenging (you might see them in an advanced algorithms course)

Linear probing is fast when array is not filled (next index typically free)

Linear probing is slow when array almost full and have to search through almost every index.

N= number of key-value pairs

M = Size of the array

Load factor = N/M

Chaining is linear growth rate (proportional to average size of one linked list)

29 of 31

Analysis of Symbol Table’s Data Structures

Data Structure

put

get

(Unordered) Linked List

O(n)

O(n)

Two arrays (keys sorted, binary search)

O(n)

O(log n)

Binary Search Tree

Ave = O(log n)

Worst = O(n)

Ave = O(log n)

Worst = O(n)

Hash Table (with separate chaining)

Ave = O(1)*

Worst= O(n)

Ave = O(1)*

Worst= O(n)

Hash table (with linear probing)

Ave = O(1)*

Worst= O(n)

Ave = O(1)*

Worst= O(n)

*Depends on uniform hashing

Classic time-space tradeoff

30 of 31

Java’s Built-In Hash Tables

import java.util.HashMap;

HashMap<String, Integer> ages = new HashMap<>();

ages.put("Alice", 20);

int age = ages.get("Alice");

System.out.println("Alice is " + age + " years old.");

Java 8: Recommended to avoid using java.util.HashTable

This is an old, legacy data structure whose codebase is locked to 1990s APIs

Initial capacity: 16�

Load factor: 0.75

Data Structure: Separate chaining (singly linked list), but once M>64, converts to red-black tree (self-balancing BST)

Source: Open JDK

31 of 31

  • Collision resolution: Separate chaining
  • Collision resolution: Linear probing
  • Analyzing and Using Hash Tables

🎯 Today’s Learning Objectives