Hash Tables II
1
Lecture 20
CS61B, Spring 2024 @ UC Berkeley
Slides Credit: Josh Hug
Visualization for Some Basic Cases
Lecture 20, CS61B, Spring 2024
Visualization for Some Basic Cases
hashCode and Equals
The Danger of Mutable Keys
A Peek into Java HashSets
Sets and Maps
We’ve now seen the two implementation philosophies for Sets and Maps.
Set
TreeSet
HashSet
Map
TreeMap
HashMap
Hash Tables in Java: Recap
Hash tables:
đậu hũ
hashCode()
-2108180664
Math.floorMod(x, 4)
0
data
hash code
hash function
reduce
index
*: Indicates “on average”.
†: Assuming items are evenly spread.
| contains(x) | add(x) |
Bushy BSTs | Θ(log N) | Θ(log N) |
Separate Chaining Hash Table With No Resizing | Θ(N) | Θ(N) |
… With Resizing | Θ(1)† | Θ(1)*† |
Hash based set ops
HashTableVisualizer
Let’s play around with a hash table visualizer.
Goal, get a deeper understanding of:
ColoredNumbers
The objects we’re inserting into the HashTable are of type ColoredNumber. Each has 2 attributes:
Let’s see what happens when we insert ColoredNumbers 0 through 19 into a hash table with 6 buckets.
private int num;
private Color color;
Distribution of Items
What do you notice about the distribution of items?
Why do you we get this distribution?
Distribution of Items (Your Answer)
What do you notice about the distribution of items?
Why do you think we get this distribution?
Creating a Custom hashCode Function
Suppose we create a hashCode function that returns 0.
What distribution do we expect?
Let’s try it out.
@Override
public int hashCode() {
return 0;
}
The Zero HashCode Function
The resulting distribution is to the right.
Designing a Hash Function
What hash function will result in the distribution to the right?
private int num;
private Color color;
Designing a Hash Function
What hash function will result in the distribution to the right?
Your answer?
Let’s check it.
private int num;
private Color color;
Other More Exotic Hash Functions
We can define whatever hash function we want:
Let’s try one. Which one do you want to try?
Why Custom Hash Functions?
Lecture 20, CS61B, Spring 2024
Visualization for Some Basic Cases
hashCode and Equals
The Danger of Mutable Keys
A Peek into Java HashSets
HashCode Comparison
We mentioned that the goal of a hash function is to try to spread items out evenly.
The Default hashCode: yellkey.com/part
We mentioned that the goal of a hash function is to try to spread items out evenly.
What do you think about the spread of the default hashCode, which returns the memory address?
The Default hashCode
We mentioned that the goal of a hash function is to try to spread items out evenly.
What do you think about the spread of the default hashCode, which returns the memory address?
Hard Question
If the default hashCode achieves good spread, why do we even bother to create custom hash functions?
contains
Lecture 20, CS61B, Spring 2024
Visualization for Some Basic Cases
hashCode and Equals
The Danger of Mutable Keys
A Peek into Java HashSets
The equals Method for a ColoredNumber
Suppose the equals method for ColoredNumber is as below, i.e. two ColoredNumbers are equal if they have the same num.
@Override
public boolean equals(Object o) {
if (o instanceof ColoredNumber otherCn) {
return this.num == otherCn.num;
}
return false;
}
HashSet Behavior
Suppose the equals method for ColoredNumber is on the previous slide, i.e. two ColoredNumbers are equal if they have the same num.
Suppose we now check whether 12 is in the hash table.
What do we expect to be returned by contains?
int N = 20;
HashSet<ColoredNumber> hs = new HashSet<>();
for (int i = 0; i < N; i += 1) {
hs.add(new ColoredNumber(i));
}
ColoredNumber twelve = new ColoredNumber(12);
hs.contains(twelve); // returns ??
HashSet Behavior
Suppose the equals method for ColoredNumber is on the previous slide, i.e. two ColoredNumbers are equal if they have the same num.
Suppose we now check whether 12 is in the hash table.
What do we expect to be returned by contains?
int N = 20;
HashSet<ColoredNumber> hs = new HashSet<>();
for (int i = 0; i < N; i += 1) {
hs.add(new ColoredNumber(i));
}
ColoredNumber twelve = new ColoredNumber(12);
hs.contains(twelve); // returns true
Finding an Item Using the Default HashCode
Suppose we are using the default hash function (uses memory address), which yields the table to the right.
Suppose equals returns true if two ColoredNumbers have the same num (as on the previous slide).
What does the contains operation return. Why?
int N = 20;
HashSet<ColoredNumber> hs = new HashSet<>();
for (int i = 0; i < N; i += 1) {
hs.add(new ColoredNumber(i));
}
ColoredNumber twelve = new ColoredNumber(12);
hs.contains(twelve); // returns ??
Finding an Item Using the Default HashCode
Suppose we are using the default hash function (uses memory address), which yields the table to the right.
Suppose equals returns true if two ColoredNumbers have the same num (as on the previous slide).
What does the contains operation return. Why?
int N = 20;
HashSet<ColoredNumber> hs = new HashSet<>();
for (int i = 0; i < N; i += 1) {
hs.add(new ColoredNumber(i));
}
ColoredNumber twelve = new ColoredNumber(12);
hs.contains(twelve); // returns ??
Finding an Item Using the Default HashCode
hashCode: Based on memory address.
equals: Based on num.
There are two ColoredNumber objects with num = 12.
Each memory address is random.
ColoredNumber twelve = new ColoredNumber(12);
hs.contains(twelve); // returns ??
Finding an Item Using the Default HashCode
hashCode: Based on memory address.
equals: Based on num.
There are two ColoredNumber objects with num = 12.
Example: If object created by code above is in memory location 6000000, its hashCode % 6 is 0.
ColoredNumber twelve = new ColoredNumber(12);
hs.contains(twelve); // returns ??
Hard Question
If the default hashCode achieves good spread, why do we even bother to create custom hash functions?
Basic rule: If two objects are equal, they’d better have the same hashCode so the hash table can find it.
Duplicate Values
Lecture 20, CS61B, Spring 2024
Visualization for Some Basic Cases
hashCode and Equals
The Danger of Mutable Keys
A Peek into Java HashSets
Overriding Equals but Not HashCode: yellkey.com/wide
Suppose we have the same equals method (comparing num), but we do not override hashCode.
Which can happen when we call add(zero)?
public boolean equals(Object o) {
... return this.num == otherCn.num; ...
}
ColoredNumber zero = new ColoredNumber(0);
hs.add(zero); // does another zero appear?
Overriding Equals but Not HashCode
Suppose we have the following equals method, but we do not override hashCode.
Which of the following can happen if we add 0 again?
@Override
public boolean equals(Object o) {
if (o instanceof ColoredNumber otherCn) {
return this.num == otherCn.num;
}
return false;
}
Overriding Equals but Not HashCode
Suppose we have the following equals method, but we do not override hashCode.
Which of the following can happen if we add 0 again?
The new zero ends up in a random bin.
@Override
public boolean equals(Object o) {
if (o instanceof ColoredNumber otherCn) {
return this.num == otherCn.num;
}
return false;
}
Takeaway: Equals and hashCode
Bottom line: If your class override equals, you should also override hashCode in a consistent manner.
If you don’t everything breaks:
Mutable vs. Immutable Types
Lecture 20, CS61B, Spring 2024
Visualization for Some Basic Cases
hashCode and Equals
The Danger of Mutable Keys
A Peek into Java HashSets
Immutable Data Types
An immutable data type is one for which an instance cannot change in any observable way after instantiation.
Examples:
The final keyword will help the compiler ensure immutability.
public class Date {
public final int month;
public final int day;
public final int year;
private boolean contrived = true;
public Date(int m, int d, int y) {
month = m; day = d; year = y;
}
}
Which of the Classes Below Are Immutable?
Immutable: an instance cannot change in any observable way after instantiation.
public class Pebble {
public int weight;
public Pebble() {
weight = 1;
}
}
public class Rock {
public final int weight;
public Rock (int w) {
weight = w;
}
}
public class RocksBox {
public final Rock[] rocks;
public RocksBox (Rock[] rox) {
rocks = rox;
}
}
public class SecretRocksBox {
private Rock[] rocks;
public SecretRocksBox(Rock[] rox) {
rocks = rox;
}
}
Which of the Classes Below Are Immutable?
Immutable: an instance cannot change in any observable way after instantiation.
public class Pebble {
public int weight;
public Pebble() {
weight = 1;
}
}
Pebble p = new Pebble();
p.weight = 2;
Example mutation:
Which of the Classes Below Are Immutable?
Immutable: an instance cannot change in any observable way after instantiation.
public class Rock {
public final int weight;
public Rock (int w) {
weight = w;
}
}
No mutation possible.
Unless you use the special “Reflections” library which lets you disobey access modifiers.
Which of the Classes Below Are Immutable?
Immutable: an instance cannot change in any observable way after instantiation.
Example mutation:
public class RocksBox {
public final Rock[] rocks;
public RocksBox (Rock[] rox) {
rocks = rox;
}
}
Rock r1 = new Rock(10);
Rock r2 = new Rock(20);
Rock[] rox = {r1, r2};
RocksBox rb = new RocksBox(rox);
rb.rocks[1] = null;
Which of the Classes Below Are Immutable?
Immutable: an instance cannot change in any observable way after instantiation.
public class SecretRocksBox {
private Rock[] rocks;
public SecretRocksBox(Rock[] rox) {
rocks = rox;
}
}
Example mutation:
Rock r1 = new Rock(10);
Rock r2 = new Rock(20);
Rock[] rox = {r1, r2};
SecretRocksBox rb = new SecretRocksBox(rox);
rox[0] = new Rock(-999);
Which of the Classes Below Are Immutable?
Immutable: an instance cannot change in any observable way after instantiation.
public class Pebble {
public int weight;
public Pebble() {
weight = 1;
}
}
public class Rock {
public final int weight;
public Rock (int w) {
weight = w;
}
}
public class RocksBox {
public final Rock[] rocks;
public RocksBox (Rock[] rox) {
rocks = rox;
}
}
public class SecretRocksBox {
private Rock[] rocks;
public SecretRocksBox(Rock[] rox) {
rocks = rox;
}
}
How Would We Make SecretRocksBox Immutable?
Immutable: an instance cannot change in any observable way after instantiation.
public class SecretRocksBox {
private Rock[] rocks;
public SecretRocksBox(Rock[] rox) {
rocks = rox;
}
}
Example mutation:
Rock r1 = new Rock(10);
Rock r2 = new Rock(20);
Rock[] rox = {r1, r2};
SecretRocksBox rb = new SecretRocksBox(rox);
rox[0] = new Rock(-999);
How Would We Make SecretRocksBox Immutable?
To make SecretRocksBox immutable, we can make our own copy of the array.
public class SecretRocks {
private Rock[] rocks;
public SecretRocks(Rock[] rox) {
rocks = new Rock[rox.length];
System.arraycopy(rox, 0,
rocks, 0,
rox.length);
}
}
Rock r1 = new Rock(10);
Rock r2 = new Rock(20);
Rock[] rox = {r1, r2};
SecretRocksBox rb = new SecretRocksBox(rox);
rox[0] = new Rock(-999);
Immutability
Advantage: Less to think about: Avoids bugs and makes debugging easier.
Disadvantage: Must create a new object anytime anything changes.
charAt(int i)
compareTo(String s)
concat(String s)
split(String r)
String
Mutable Hash Table Keys
Lecture 20, CS61B, Spring 2024
Visualization for Some Basic Cases
hashCode and Equals
The Danger of Mutable Keys
A Peek into Java HashSets
Mutable HashSet Keys
In principle, we can create a HashSet<List>.
Weird stuff happens if:
Example: hashCode
Consider an ArrayList equal to [0, 1].
List<Integer> items = new ArrayList<>();
items.add(0);
items.add(1);
System.out.println(items.hashCode());
Example
Consider an ArrayList equal to [0, 1].
�If we add this list to a HashSet with 4 buckets, it lands in bucket 2 (962 % 4 = 2).
List<Integer> items = new ArrayList<>();
items.add(0);
items.add(1);
System.out.println(items.hashCode());
HashSet<List<Integer>> hs = new HashSet<>();
hs.add(items);
0
1
2
3
[0, 1]
Example
First, we added [0, 1], which had hashCode 962, and landed in bucket 2.
List<Integer> items = new ArrayList<>();
items.add(0);
items.add(1);
HashSet<List<Integer>> hs = new HashSet<>();
hs.add(items);
hs.add(List.of(2, 3));
0
1
2
3
[0, 1]
[2, 3]
Example
Suppose we add [0, 1], then [2, 3].
Now suppose we add the number 7 to items.
List<Integer> items = new ArrayList<>();
items.add(0);
items.add(1);
HashSet<List<Integer>> hs = new HashSet<>();
hs.add(items);
hs.add(List.of(2, 3));
items.add(7);
0
1
2
3
[0, 1, 7]
[2, 3]
Example
Suppose we add [0, 1], then [2, 3].
The hashCode of a list with [0, 1, 7] is 29829.
List<Integer> items = new ArrayList<>();
items.add(0);
items.add(1);
HashSet<List<Integer>> hs = new HashSet<>();
hs.add(items);
hs.add(List.of(2, 3));
items.add(7);
0
1
2
3
[0, 1, 7]
[2, 3]
Example
Suppose we add [0, 1], then [2, 3].
The hashCode of a list with [0, 1, 7] is 29829.
List<Integer> items = new ArrayList<>();
items.add(0);
items.add(1);
HashSet<List<Integer>> hs = new HashSet<>();
hs.add(items);
hs.add(List.of(2, 3));
items.add(7);
System.out.println(hs.contains(items));
0
1
2
3
[0, 1, 7]
[2, 3]
Example
Suppose we add [0, 1], then [2, 3].
If we call contains(items), we have a problem.
List<Integer> items = new ArrayList<>();
items.add(0);
items.add(1);
HashSet<List<Integer>> hs = new HashSet<>();
hs.add(items);
hs.add(List.of(2, 3));
items.add(7);
System.out.println(hs.contains(items));
0
1
2
3
[0, 1, 7]
[2, 3]
Don’t Mutate Keys
Bottom line: Never mutate an Object being used as a key.
List<Integer> items = new ArrayList<>();
items.add(0);
items.add(1);
HashSet<List<Integer>> hs = new HashSet<>();
hs.add(items);
hs.add(List.of(2, 3));
items.add(7);
System.out.println(hs.contains(items));
0
1
2
3
[0, 1, 7]
[2, 3]
A Peek into Java HashSets
Lecture 20, CS61B, Spring 2024
Visualization for Some Basic Cases
hashCode and Equals
The Danger of Mutable Keys
A Peek into Java HashSets
HashSets in Java
We can look at the code that implements the HashSet in Java:
It simply delegates all of its work to a HashMap<K, Object> and ignores the value.
HashSets in Java
We can then look at the code that implements the HashMap in Java:
Reading the code, we can see that:
đậu hũ
hashCode()
-2108180664
(hc ^ (hc >>> 16)) & (N -1)
15
data
hash code
hash function
reduce
index
For N = 16
HashSets in Java
Josh wrote a class called HashSet probe that uses the reflections library to print out the size of the array holding the buckets.
HashSets in Java
Josh wrote a class called HashSet probe that uses the reflections library to print out the size of the array holding the buckets.
Resize occurred, N = 13, hash table array size = 32
Resize occurred, N = 25, hash table array size = 64
Resize occurred, N = 49, hash table array size = 128
Resize occurred, N = 97, hash table array size = 256
Resize occurred, N = 193, hash table array size = 512
Resize occurred, N = 385, hash table array size = 1024
Resize occurred, N = 769, hash table array size = 2048
Resize occurred, N = 1537, hash table array size = 4096
Visualizing a Hash Table With Load Factor 0.75.
Let’s run a simulation to see what happens if the load factor is kept to 0.75 or less.
Visualizing a Hash Table With Load Factor 0.75.
Let’s run a simulation to see what happens if the load factor is kept to 0.75 or less.
Longest bucket in the simulation looks like it’s around ~5.
Another Interesting Optimization
If we ctrl-F for “red-black” we find that that if a bin gets too full, it is converted into a red-black tree!
“The most useful algorithms are, unfortunately, not always the most beautiful.”
-Josh Hug
Citations
Hashbrowns in Cyberspace by Dall-E.
FAQ
What is the distinction between hash set, hash map, and hash table?
A hash set is an implementation of the Set ADT using the “hash table” as its engine.
A hash map is an implementation of the Map ADT using the “hash table” as its engine.
A “hash table” is a way of storing information, where you have M buckets that store N items. Each item has a “hashCode” that tells you which of M buckets to put that item in.