1 of 29

Lecture 21:

Hash Tables, Part 1

CS 136: Spring 2024

Katie Keith

2 of 29

Record on Zoom

3 of 29

  • Working on Lab 6 this week
  • Construction in the CS Common Room so Katie’s office hours this week and next week in TBL 302A
  • TA eval form (Due Today; Fri April 18) : https://forms.gle/sbqCGVLAFnhUQ4i39
  • Colloquium this today: Dr. Ellie Pavlick

📣 Announcements

4 of 29

📚Readings

  • Sedgewick and Wayne. Algorithms. Section 3.4.

5 of 29

Multiple data structures can implement the same ADT

Binary Search Tree

Data Structures

Linked List

Hash Tables

List

Finite number of elements

(same element may occur more than once)

Two arrays (sorted)

Stack

Last in first out (LIFO) operations

Symbol Table

Associates a key with a value

Graph

A set of vertices, pairs of which are connected by edges

Queue

First in first out (FIFO) operations

String[] keys;

int[] values;

Mon

Mon

Weds

Lab 6: Implement!

Today!

ADT

6 of 29

Review: Python Dictionaries are Symbol Tables

gold_medals = {

"USA": 40,

"China": 40,

"Japan": 20,

"Saint Lucia": 1,

}

gold_medals["New Zealand"] = 10

print(gold_medals["USA"])

put operation

get operation (search!)

Key of a dictionary

Value of a dictionary

Preview: Python dictionaries are implemented with Hash Tables!

7 of 29

  • BST practice
  • Hash Tables: Hash functions
  • Java’s .hashCode()

🎯 Today’s Learning Objectives

8 of 29

Binary Search Tree: Inner Node Class

public class BST<Key, Value> implements SymbolTable<Key, Value> {

private Node root; // root node of BST

private int numPairs; // number of key-value pairs in the symbol table

private class Node {

private Key key; // BSTs maintain left/right subtree ordering by key

private Value val; // value for the node

private Node left; // node at the beginning of the left subtree

private Node right; // node at the beginning of the right subtree

...

The beginning of a Java implementation of a Binary Search Tree

9 of 29

BST put example

'E'

root node

the key of this node is ‘E’

105

'S'

30

'X'

106

'A'

35

'C'

12

'H'

24

'R'

45

the value of this node is 105

Step 1. Search for ‘L’. This ends in a null link.

Step 2. Create new node and update the link of the parent node.

'L’

76

BST<Character, Integer> bst = new BST<Character, Integer>();

bst.put(‘S’, 30);

bst.put(‘E’, 105);

bst.put(‘L’, 76);

10 of 29

The method below finds the smallest common ancestor node of two nodes’ keys. What is printed?

public static int sca(Node current, int pKey, int qKey){

while (current != null) {

if (pKey > current.key && qKey > current.key) {

current = current.right;

}

else if (pKey < current.key && qKey < current.key) {

current = current.left;

}

else {

return current.key;

}

}

return -1;

}

public static void main(String[] args){

BinarySearchTree<Integer, Integer> ss;

ss = new BinarySearchTree<Integer, Integer>();

ss.put(6, 1);

ss.put(2, 3);

ss.put(8, 9);

ss.put(0, 5);

ss.put(4, 2);

ss.put(7, 1);

ss.put(9, 9);

ss.put(3, 4);

ss.put(5, 0);

int low = sca(ss.root, 3, 0);

System.out.println(low);

}

put(key, value)

Keys are integers so we have dropped the comparator from put

💡Think-pair-share

11 of 29

Analysis of Symbol Table’s Data Structures

Data Structure

put

get

(Unordered) Linked List

O(n)

O(n)

Two arrays (keys sorted, binary search)

O(n)

O(log n)

Binary Search Tree

Ave = O(log n)

Worst = O(n)

Ave = O(log n)

Worst = O(n)

Hash Table (with separate chaining)

Hash table (with linear probing)

12 of 29

  • BST practice
  • Hash Tables: Hash functions
  • Java’s .hashCode()

🎯 Today’s Learning Objectives

13 of 29

Precursor idea: Arrays

Arrays are fast!

Performance guarantees:

  1. Read an element: O(1)
  2. Write an element: O(1)

“pqr”

null

“xyz”

“ijk”

Why?

In Java, all elements in an array are equally sized and stored in a contiguous block of memory, so obtaining the memory addresses is a simple arithmetic operation, O(1)

14 of 29

Precursor idea: Arrays

If keys were small integers, we could use an array to implement an unordered symbol table.

Key

Value

‘a’

“xyz”

‘b’

“pqr”

‘c’

“ijk”

Key

Hash Code

‘a’

2

‘b’

0

‘c’

3

Array index (where the key’s value will be stored)

“pqr”

null

“xyz”

“ijk”

hash code = index = 0

1

2

3

Need: A hash function to map keys to hash codes.

15 of 29

Hash function properties

We want to create hash functions that obey the following properties:

  • Efficient: The function should be fast to compute.
  • Deterministic: A given input value must always generate the same hash code.
  • Equivalent: Any two codes that are considered equivalent (.equals() in Java) should produce the same hash code.
  • Uniform: Maps the expected inputs as evenly as possible over the output range.

16 of 29

Examine the method below. Is this a “good” hash function? Which properties of hash functions does it align (not align) with?

💡Think-pair-share

17 of 29

Modular hashing with primes

Example:

Key

Hash code = Key % 4

Hash code = Key % 7

10

2

3

20

0

6

30

2

2

40

0

5

50

2

1

60

0

4

Using M=4 (not prime), hash codes are all 0 or 2 (not evenly distributed).

Using M=7 (prime) hash codes are evenly distributed.

Hash function:

Key % M where M is prime

Why? Prime numbers have no divisors other than 1 and themselves, reducing the chance that patterns in the keys align with %M.

18 of 29

Scenario:

Keys: n=0 to n=499 (ints)

M = 97

Hash function: n % M for each n

Questions:

  1. In this scenario, approximately how many collisions (when a key maps to a hash code that already has a key) will we have?
  2. What are the advantages and disadvantages of increasing M?

Figure credit: Adobe stock

💡Think-pair-share

19 of 29

Preview: Monday

Two approaches to collision resolution

We’ll talk about the time-space tradeoff!

20 of 29

  • BST practice
  • Hash Tables: Hash functions
  • Java’s .hashCode()

🎯 Today’s Learning Objectives

21 of 29

Java’s hashCode()

  • Java requires that every data type must implement a method called hashCode() which returns a 32-bit integer.
  • Java enforces that hashCode() must be consistent with equals().

22 of 29

Goal: Reproduce Java’s hash code for Strings

Key

Hash code

“Blah”

2073105

“great”

???

“dogs!”

???

Integers

String s = "Blah";

System.out.println(s.hashCode());

Prints: 2073105

How? Why?

23 of 29

ASCII code

https://www.cs.cmu.edu/~pattis/15-1XX/common/handouts/ascii.html

  • First edition published in 1963
  • 7 bits to represent 128 unique characters (English letters, digits, and various symbols)

24 of 29

Aside: Unicode

  • Updated every year to the present
  • 149,813 characters and 161 scripts
  • https://home.unicode.org/about-unicode/

25 of 29

Goal: Reproduce Java’s hash code for Strings

// Method that replicates how Java makes a hashCode for a String

public static int ourStringHash(String s){

int R = 31; // Java chooses R as this small prime

int hash = 0;

for (int i = 0; i < s.length(); i++)

hash = R * hash + s.charAt(i);

return hash;

}

“Polynomial rolling hash function”: ensures that all characters in the string influence the final hash value, reducing the likelihood of collisions

Adds ASCII value of current character

26 of 29

HashCodePlay.java

💻

27 of 29

Java: Modular hashing function

public static int hash(Key key) {

return (key.hashCode() & 0x7fffffff) % M;

}

Java’s required hash code method for every data type

Hexadecimal number which is the maximum positive value for a 32-bit signed binary number in Java

Bitwise AND operation

Strips off the sign bit of the hash code, ensuring that result is a non-negative integer.

Mod operator

Size of array (typically we choose a very large prime number)

28 of 29

Custom classes: Must write .hashCode and .equals

class CustomClass{

...

public boolean equals(Object obj){

...

}

public int hashCode(){

...

}

}

Make sure these are consistent!

29 of 29

  • Wrap-up BSTs
  • Hashing functions
  • Java’s .hashCode()

🎯 Today’s Learning Objectives