Hash Tables

Carlo Cruz-Albrecht

What is a Map?

How might we implement one?

Btw a Set is like a map but it only contains the keys, there are no values.

Consider this Binary Search Tree

Chicago

Albany

NYC

Boston

BSTs can be used to implement a Map

  • The key is what is used for comparison

Chicago

Illinois

Albany

California

NYC

New York

Boston

Mass.

Draw Node class:

public class Node<K, V> {

K key;

V value;

Node left;

Node right;

}

BSTMap map = new BSTMap();

map.get(“Berkeley”)

map.put(“Houston”, “Texas”)

O(log N)

N = number of items in our map

Yay so we can use BSTs to make a map.

TreeMap in java uses a BST (a red black tree)

But can we do better??

Θ(1) time? 😄

Arrays can access any index position Θ(1)

Let’s take advantage of that.

Map for this scenario:

Student IDs -> Student Names:

….

30318234567 : Catherine

30315678902 : Joe

30367818234 : Carlo

Draw array, write class Map out

Let the key (ID number) determine the index position in the array.

problem?

public class Map {

String[] array = new String[300000000000];

public void add(int studentID, String studentName) {

array[studentID] = studentName;

}

}

Memory inefficient.

Solution:

Let’s try to use an array of size 10.

So, instead of using the raw ID numbers, we try to convert it to an index between 0 and 10.

Goal: Convert ID number to index between 0 and 100.

Solution: Modulus (Remainder operator)

567 % 10

189373762712 % 10

Any number % N gives us a number between 0 and N

public int convert(int studentID) {

return studentID % 10;

}

The function that assigns a specific integer given a key is called hash function

integer

key

hashCode

In this case making the hashCode was easy but this isn’t always the case (how do you hash a String?) We’ll learn more about hash functions tomorrow but the good news is java libraries already have hashCode implemented (Object class)

Every time you call the hashCode on the same key, you get the same integer back.

Animal a = new Animal();

a.hashCode();

Animal b = a;

b.hashCode();

public class Map {

CHANGE THIS!!

String[] array = new String[10];

public void add(int studentID, String studentName) {

CHANGE THIS:

Int index = hashCode(studentID);

array[index] = studentName;

}

public int hashCode(int studentID) {

return studentID % 10;

}

}

See how we drastically reduced the amount of space we use???

Only one small problem…. collisions.

303245698 : Srinath

303102038 : Tian

Both hash to 8. We get a collision in the array!

Solution: we create a linked list at each index position so we can hold multiple values

0

1

2

3

4

5

6

7

8

9

Tian

Srinath

Carlo

Joe

Catherine

0

1

2

3

4

5

6

7

8

9

Tian

Srinath

Carlo

Joe

Catherine

What’s bad here?

More collisions mean a worse runtime.

Many collisions means worse runtime.

Good:

public int hashCode() {

int h = hash;

for (int i = 0; i < value.length; i++) {

h = 31 * h + val[i];

}

return h;

}

Bad:

public int hashCode() {

return 3;

}

You want your hashCode to return a different integer for different keys.

Result: each index position in the array has around the same number of items

GOAL: keep collisions to a minimum

That’s the hash function’s job.

Don’t worry about this right now :)

RESIZING

Also, if we have too many keys we should make our array bigger to prevent long linked lists

Recap basic strategy:

  • Given a key, use hashCode(key) to get an index

  • Go to index in the array and get our value (Θ(1) time)
    • Might have to go down a linked list but a good hashCode + resizing ensures the linked list will be short ⇒ thus, still Θ(1) runtime

Just to recap: this is amazing!!

Even if we have thousands of elements we can put them in a HashMap and access any element in constant time!!*

*If those elements are a object we created, we must make sure to make a good hash code to get constant time

Hash Tables - Google Slides