1 of 54

Lecture 7: Recurrences and Intro to Hashing

CSE 373: Data Structures and Algorithms

1

Please log onto Live Lecture Ed Thread to submit live lecture questions

Please log onto tinyurl.com/Summer373L7 to answer the daily lecture participation question

2 of 54

Announcements

  • Exercise 1 – Algorithm Analysis – Due Today
  • Exercise 2 – Out Today!

  • Project 1 – Deques – Due Yesterday (3-day late day submissions until Sunday)
  • Project 2 is out! Due Wednesday July 20th
  • - 2 week assignment, PLEASE PLEASE PLEASE START NOW

CSE 373 22 SP – CHAMPION

2

For real, though, it will take you 2 weeks, do not wait until next week to start

3 of 54

Warm Up!

CSE 373 SP 22 – CHAMPION

3

What’s the theta bound for the runtime function for this piece of code?

public void method1(int n) {

if (n <= 100) {

System.out.println(“:3”);

} else {

System.out.println(“:D”);

for (int i = 0; i<16; i++) {

method1(n / 4);

}

}

}

Please fill out the Poll at- tinyurl.com/Summer373L7

 

a = 16, b = 4, c = 0

 

 

 

 

 

 

 

 

 

 

 

If

If

If

then

then

then

Master Theorem

2 > 0

4 of 54

Questions

CSE 373 20 SP – CHAMPION & CHUN

4

5 of 54

Wednesday’s Warm Up

  •  

CSE 373 22 SP – CHAMPION

5

All false!

Try on your own!

6 of 54

Case Study: Linear Search

CSE 373 SP 22 - CHAMPION

int linearSearch(int[] arr, int toFind) {

for (int i = 0; i < arr.length; i++) {

if (arr[i] == toFind)

return i;

}

return -1;

}

2

3

9

4

5

arr

toFind

2

2

3

9

4

5

toFind

8

i

 

arr

i

i

i

i

i

7 of 54

Best Case

Worst Case

On Lucky Earth

On Unlucky Earth (where it’s 2020 every year)

2

3

9

4

5

arr

toFind

2

i

2

3

9

4

5

arr

toFind

8

i

f(n) = 3n + 1

f(n) = 2

O(1)

 

Θ(1)

O(n)

 

Θ(n)

After asymptotic analysis:

After asymptotic analysis:

8 of 54

Questions

CSE 373 20 SP – CHAMPION & CHUN

8

9 of 54

Modeling Recursive Code

CSE 373 20 SP – CHAMPION & CHUN

9

10 of 54

Recurrence to Big-Θ

It’s still really hard to tell what the big-O is just by looking at it.

But fancy mathematicians have a formula for us to use!

CSE 373 SP 20 – CHUN & CHAMPION

 

 

 

 

 

 

 

 

 

If

If

If

then

then

then

Master Theorem

 

11 of 54

Understanding Master Theorem

  • The log of a < c case
    • Recursive case does a lot of non recursive work in comparison to how quickly it divides the input size
    • Most work happens in beginning of call stack
    • Non recursive work in recursive case dominates growth, nc term

    • The log of a = c
    • Recursive case evenly splits work between non recursive work and passing along inputs to subsequent recursive calls
    • Work is distributed across call stack

  • The log of a > c case
    • Recursive case breaks inputs apart quickly and doesn’t do much non recursive work
    • Most work happens near bottom of call stack

CSE 373 SP 20 – CHUN & CHAMPION

11

  • A measures how many recursive calls are triggered by each method instance
  • B measures the rate of change for input
  • C measures the dominating term of the non recursive work within the recursive method
  • D measures the work done in the base case

 

 

 

 

 

 

 

 

If

If

If

then

then

then

Master Theorem

12 of 54

Recursive Patterns

  • Pattern #1: Halving the Input

  • Pattern #2: Constant size input and doing work

  • Pattern #3: Doubling the Input

CSE 373 20 WI – HANNAH TANG

12

Binary Search Θ(logn)

Merge Sort

13 of 54

Merge Sort

CSE 373 SP 18 - KASEY CHAMPION

13

0

1

2

3

4

5

6

7

8

9

8

2

91

22

57

1

10

6

7

4

Divide

0

1

2

3

4

8

2

91

22

57

5

6

7

8

9

1

10

6

7

4

Conquer

0

8

0

8

0

1

2

3

4

2

8

22

57

91

5

6

7

8

9

1

4

6

7

10

0

1

2

3

4

5

6

7

8

9

1

2

4

6

7

8

10

22

57

91

Combine

14 of 54

Merge Sort

CSE 373 SP 18 - KASEY CHAMPION

14

mergeSort(input) {

if (input.length == 1)

return

else

smallerHalf = mergeSort(new [0, ..., mid])

largerHalf = mergeSort(new [mid + 1, ...])

return merge(smallerHalf, largerHalf)

}

0

1

2

3

4

8

2

57

91

22

0

1

8

2

0

1

2

57

91

22

0

8

0

2

0

57

0

1

91

22

0

91

0

22

0

1

22

91

0

1

2

22

57

91

0

1

2

8

0

1

2

3

4

2

8

22

57

91

1 if n<= 1

2T(n/2) + n otherwise

T(n) =

Pattern #2 – Constant size input and doing work

Take a guess! What is the Big-O of worst case merge sort?

15 of 54

Merge Sort Recurrence to Big-Θ

CSE 373 SP 20 – CHUN & CHAMPION

 

 

 

 

 

 

 

 

If

If

If

then

then

then

Master Theorem

 

 

1 if n<= 1

2T(n/2) + n otherwise

T(n) =

16 of 54

Recursive Patterns

  • Pattern #1: Halving the Input

  • Pattern #2: Constant size input and doing work

  • Pattern #3: Doubling the Input

CSE 373 20 WI – HANNAH TANG

16

Binary Search Θ(logn)

Merge Sort Θ(nlogn)

Calculating Fibonacci

17 of 54

Calculating Fibonacci

  • public int fib(int n) {
  • if (n <= 1) {
  • return 1;
  • }
  • return fib(n-1) + fib(n-2);
  • }

CSE 373 20 WI – HANNAH TANG

17

Almost

f(4)

f(3)

f(2)

f(2)

f(2)

f(1)

f(1)

f(1)

f(1)

f(1)

f(1)

  • Each call creates 2 more calls
  • Each new call has a copy of the input, almost
  • Almost doubling the input at each call

Pattern #3 – Doubling the Input

18 of 54

Calculating Fibonacci Recurrence to Big-Θ

  • public int f(int n) {
  • if (n <= 1) {
  • return 1;
  • }
  • return f(n-1) + f(n-2);
  • }

CSE 373 20 WI – HANNAH TANG

18

d

2T(n-C1) + C2

 

 

Master Theorem

Can we use master theorem?

Uh oh, our model doesn’t match that format…

Maybe geometry can help!

Can we intuit a pattern?

T(1) = d

T(2) = 2T(2-1) + c = 2(d) + c

T(3) = 2T(3-1) + c = 2(2(d) + c) + c = 4d + 3c

T(4) = 2T(4-1) + c = 2(4d + 3c) + c = 8d + 7c

T(5) = 2T(5-1) + c = 2(8d + 7c) + c = 16d +25c

Looks like something’s happening but it’s tough

Finish the recurrence, what is the model for the recursive case?

2T(n-C1) + C2

19 of 54

Calculating Fibonacci Recurrence to Big-Θ

How many layers in the function call tree?

How many layers will it take to transform “n” to the base case of “1” by subtracting 1

For our example, 4 -> Height = n

How many function calls per layer?

CSE 373 20 WI – HANNAH TANG

19

 

Layer

Function calls

1

1

2

2

3

4

4

8

How many function calls on layer k?

2k-1

How many function calls TOTAL

for a tree of k layers?

1 + 2 + 3 + 4 + … + 2k-1

f(4)

f(3)

f(2)

f(2)

f(2)

f(1)

f(1)

f(1)

f(1)

f(1)

f(1)

20 of 54

Calculating Fibonacci Recurrence to Big-Θ

  • Patterns found:

CSE 373 20 SP – CHAMPION & CHUN

20

How many function calls on layer k?

2k-1

How many function calls TOTAL for a tree of k layers?

1 + 2 + 4 + 8 + … + 2k-1

Total runtime = (total function calls) x (runtime of each function call)

Total runtime = (1 + 2 + 4 + 8 + … + 2k-1) x (constant work)

1 + 2 + 4 + 8 + … + 2k-1 =

 

How many layers in the function call tree?

n

 

Summation Identity

Finite Geometric Series

 

21 of 54

Recursive Patterns

  • Pattern #1: Halving the Input

  • Pattern #2: Constant size input and doing work

  • Pattern #3: Doubling the Input

CSE 373 20 WI – HANNAH TANG

21

Binary Search Θ(logn)

Merge Sort Θ(nlogn)

Calculating Fibonacci Θ(2n)

22 of 54

Questions

CSE 373 20 SP – CHAMPION & CHUN

22

23 of 54

Intro to Hashing

CSE 373 20 SP – CHAMPION & CHUN

23

24 of 54

Dictionaries (aka Maps)

  • Every Programmer’s Best Friend
  • You’ll probably use one in almost every programming project.
    • Because it’s hard to make a big project without needing one sooner or later.

CSE 373 19 SU - ROBBIE WEBER

// two types of Map implementations supposedly covered in CSE 143

Map<String, Integer> map1 = new HashMap<>();

Map<String, String> map2 = new TreeMap<>();

25 of 54

Review: Maps

  • map: Holds a set of distinct keys and a collection of values, where each key is associated with one value.
    • a.k.a. "dictionary"

CSE 373 19 SU - ROBBIE WEBER

key

value

“you"

22

key

value

“in"

37

key

value

“the"

56

key

value

“at"

43

map.get("the")

56

Dictionary ADT

put(key, item) add item to collection indexed with key

get(key) return item associated with key

containsKey(key) return if key already in use

remove(key) remove item and associated key

size() return count of items

state

behavior

Set of items & keys

Count of items

supported operations:

    • put(key, value): Adds a given item into collection with associated key,
      • if the map previously had a mapping for the given key, old value is replaced.
    • get(key): Retrieves the value mapped to the key
    • containsKey(key): returns true if key is already associated with value in map, false otherwise
    • remove(key): Removes the given key and its mapped value

26 of 54

Implementing a Map with an Array

CSE 373 22 SP - CHAMPION

ArrayMap<K, V>

put find key, overwrite value if there. Otherwise create new pair, add to next available spot, grow array if necessary

get scan all pairs looking for given key, return associated item if found

containsKey scan all pairs, return if key is found

remove scan all pairs, replace pair to be removed with last pair in collection

size return count of items in dictionary

state

behavior

Pair<K, V>[] data

Big O Analysis – (if key is the last one looked at / not in the dictionary)

put()

get()

containsKey()

remove()

size()

O(1) constant

O(N) linear

O(N) linear

O(N) linear

O(N) linear

0

1

2

3

containsKey(‘c’)

get(‘d’)

put(‘b’, 97)

put(‘e’, 20)

(‘a’, 1)

(‘b’, 2)

Map ADT

put(key, item) add item to collection indexed with key

get(key) return item associated with key

containsKey(key) return if key already in use

remove(key) remove item and associated key

size() return count of items

state

behavior

Set of items & keys

Count of items

(‘c’, 3)

97)

(‘d’, 4)

Big O Analysis – (if the key is the first one looked at)

put()

get()

containsKey()

remove()

size()

O(1) constant

O(1) constant

O(1) constant

O(1) constant

O(1) constant

4

(‘e’, 20)

27 of 54

Implementing a Map with Nodes

CSE 373 19 SU - ROBBIE WEBER

LinkedMap<K, V>

put if key is unused, create new with pair, add to front of list, else replace with new value

get scan all pairs looking for given key, return associated item if found

containsKey scan all pairs, return if key is found

remove scan all pairs, skip pair to be removed

size return count of items in dictionary

state

behavior

front

size

containsKey(‘c’)

get(‘d’)

put(‘b’, 20)

Map ADT

put(key, item) add item to collection indexed with key

get(key) return item associated with key

containsKey(key) return if key already in use

remove(key) remove item and associated key

size() return count of items

state

behavior

Set of items & keys

Count of items

front

‘c’

9

‘b’

7

‘d’

4

‘a’

1

20

Big O Analysis – (if key is the last one looked at / not in the dictionary)

put()

get()

containsKey()

remove()

size()

O(1) constant

O(N) linear

O(N) linear

O(N) linear

O(N) linear

Big O Analysis – (if the key is the first one looked at)

put()

get()

containsKey()

remove()

size()

O(1) constant

O(1) constant

O(1) constant

O(1) constant

O(1) constant

28 of 54

Can we do better?

Let’s simplify the problem we’re working with + combine it with some facts about arrays.

Problem Simplification: only worry about supporting integer keys

Array Facts: accessing (data[i]) or updating an element (data[i] = …) at a given index takes Theta(1) runtime.

If we store the Key-Value pairs at the data[key] then we don’t have to do any looping to find it. For example consider `containsKey` or `get` -- we can just jump directly to data[key] to figure out the return answer.

CSE 373 SP 22 - CHAMPION

28

DirectAccessMap<Integer, V>

put put item at given index

get get item at given index

containsKey if data[] null at index, return false, return true otherwise

remove nullify element at index

size return count of items in dictionary

state

behavior

Data[]

size

indices

0

1

2

3

4

5

6

7

8

9

data

(3, Sherdil)

put(3, “Sherdil”);

get(3);

29 of 54

Can we do better? -- Direct Access Map impl.

  • public void put(int key, V value) {
  • this.array[key] = value;
  • }

  • public boolean containsKey(int key) {
  • return this.array[key] != null;
  • }

  • public V get(int key) {

return this.array[key];

  • }

  • public void remove(int key) {
  • this.array[key] = null;
  • }

CSE 373 SP 22 - CHAMPION

29

DirectAccessMap<Integer, V>

put put item at given index

get get item at given index

containsKey if data[] null at index, return false, return true otherwise

remove nullify element at index

size return count of items in dictionary

state

behavior

Data[]

size

Operation

Array w/ indices as keys

put(key,value)

best

𝚹(1)

worst

𝚹(1)

get(key)

best

𝚹(1)

worst

𝚹(1)

containsKey(key)

best

𝚹(1)

worst

𝚹(1)

30 of 54

Direct Access Map tradeoffs:

  •  

CSE 373 20 SP – CHAMPION & CHUN

- what’s a benefit of using DirectAccessMap?

- what’s a bad thing when using DirectAccessMap?

31 of 54

Can we do this for any integer?

  • Idea 1:
  • Create a GIANT array with every possible integer as an index
  • Problems:
    • Can we allocate an array big enough?
    • Super wasteful

  • Idea 2:
  • Create a smaller array, but create a way to translate given integer keys into available indices. Way less wasteful space-wise.
  • Problem:
    • How can we pick a good translation?

CSE 373 SU 19 - ROBBIE WEBER

31

32 of 54

Hash functions: translating a piece of data to an int

  • In our case: we want to translate int keys to a valid index in our array. If our array is length 10 but our input key is 500, we need to make sure we have a way of mapping that to a number between 0 and 9 (the valid indices for a length 10 array). This mapping that we decide on is a hash function.

  • One simple thing we can do (and that you will do when you implement this in your project):
  • Hash function: take your key and % it by the length of the array.
  • ex: key is 500, and array is length 10 – if you take 500 % 10, you will get the number 0, so we’d just plop 500 and it’s value at index 0.

CSE 373 20 SP – CHAMPION & CHUN

Hash function definition

hash function is any function that can be used to map data of arbitrary size to fixed-size values.

33 of 54

“review”: Integer remainder with % “mod”

  • The % operator computes the remainder from integer division.

14 % 4 is 2� � 3 43� 4 ) 14 5 ) 218� 12 202 18� 15� 3

  • Applications of % operator:
    • Obtain last digit of a number: 230857 % 10 is 7
    • See whether a number is odd: 7 % 2 is 1, 42 % 2 is 0
    • Limit integers to specific range: 8 % 12 is 8, 18 % 12 is 6

CSE 142 SP 18 – BRETT WORTZMAN

33

218 % 5 is 3

Limit keys to indices within array

Equivalently, to find a % b (for a,b > 0):

while(a > b-1)

a -= b;

return a;

34 of 54

First Hash Function: % table size

CSE 373 SP 22 - CHAMPION

34

indices

0

1

2

3

4

5

6

7

8

9

elements

put(0, “foo”);

put(5, “bar”);

put(11, “biz”)

put(18, “bop”);

“foo”

0 % 10 = 0

5 % 10 = 5

11 % 10 = 1

18 % 10 = 8

“bop”

“bar”

“biz”

35 of 54

Implement First Hash Function

  • public void put(int key, int value) {
  • data[hashToValidIndex(key)] = value;
  • }

  • public V get(int key) {
  • return data[hashToValidIndex(key)];
  • }

  • public int hashToValidIndex(int k) {
  • return k % this.data.length;
  • }

CSE 373 SU 19 - ROBBIE WEBER

35

SimpleHashMap<Integer>

put mod key by table size, put item at result

get mod key by table size, get item at result

containsKey mod key by table size, return data[result] == null remove mod key by table size, nullify element at result

size return count of items in dictionary

state

behavior

Data[]

size

Operation

Array w/ indices as keys

put(key,value)

best

𝚹(1)

worst

𝚹(1)

get(key)

best

𝚹(1)

worst

𝚹(1)

containsKey(key)

best

𝚹(1)

worst

𝚹(1)

Note: % is just a math operator like +, -, /, *, so it’s constant runtime

36 of 54

Questions?

things we talked about:

  • review of ArrayMap + LinkedMap
  • DirectAccessMap
  • % as a hash function andSimpleHashMap

37 of 54

First Hash Function: % table size

CSE 373 SP 22 - CHAMPION

37

indices

0

1

2

3

4

5

6

7

8

9

elements

put(0, “foo”);

put(5, “bar”);

put(11, “biz”)

put(18, “bop”);

Collision!

“foo”

0 % 10 = 0

5 % 10 = 5

11 % 10 = 1

18 % 10 = 8

20 % 10 = 0

“bop”

“bar”

“biz”

:(

put(20, “:(”);

38 of 54

Hash Obsession: Collisions

  • Collision: multiple keys translate to the same location of the array

  • Future big idea: the fewer the collisions, the better the runtime! (we’ll see this when we figure out that resolving these leads to worse runtime)

  • Two questions:
  • 1. When we have a collision, how do we resolve it?
  • 2. How do we minimize the number of collisions?

CSE 373 SU 19 - ROBBIE WEBER

38

39 of 54

Roadmap for lecture content today

  • Maps/Dictionary review
  • DirectAccessMap
    • a map implemented with an array with only integer keys
  • SimpleHashMap
    • a more flexible version of DirectAccessMap that uses a hash function on the key of interest to figure out where it is in the array
  • SeparateChainingHashMap
    • fixes some limitations of the above Maps while still being very fast (in-practice).
    • It’s what you’ll implement in project 2 / what Java’s official HashMap does -- it’s the back-bone data structure that powers so many Java programs and that you will definitely use if you keep programming. Get hyped!

CSE 373 20 SP – CHAMPION & CHUN

40 of 54

Strategies to handle hash collision

  • There are multiple strategies. In this class, we’ll cover the following ones:

  • 1. Separate chaining
  • 2. Open addressing
    • Linear probing
    • Quadratic probing
    • Double hashing

CSE 373 AU 18 – SHRI MARE

40

41 of 54

Separate chaining

  • Solution 1: Separate Chaining
  • Each index in our array represents a “bucket”. When an item x hashes to index h:
    • If the bucket at index h is empty: create a new list containing x
    • If the bucket at index h is already a list: add x if it is not already present

  • in other words:
  • If multiple things hash to the same index, then we’ll just put all of those in that same index bucket. Often, you’ll see the data structure chosen is a linked-list like structure.

CSE 373 ROBBIE WEBER + HANNAH TANG

41

42 of 54

Separate chaining

  • // some pseudocode

public boolean containsKey(int key) {

int bucketIndex = key % data.length;

loop through data[bucketIndex]

return true if we find the key in

data[bucketIndex]

return false if we get to here (didn’t

find it)

}

CSE 373 ROBBIE WEBER + HANNAH TANG

42

Reminder: the implementations of put/get/containsKey are all very similar, and almost always will have the same complexity class runtime

runtime analysis

Are there different possible states for our Hash Map that make this code run slower/faster, assuming there are already n key-value pairs being stored?

Yes! If we had to do a lot of loop iterations to find the key in the bucket, our code will run slower.

43 of 54

A best case situation for separate chaining

CSE 373 20 SP – CHAMPION & CHUN

0

1

2

3

4

5

6

7

8

9

(0, b)

(2, b)

(3, b)

(4, b)

(5, b)

(6, b)

(7, b)

(8, b)

It’s possible (and likely if you follow some best-practices) that everything is spread out across the buckets pretty evenly. This is the opposite of the last slide: when we have minimal collisions, our runtime should be less. For example, if we have a bucket with only 0 or 1 element in it, checking containsKey for something in that bucket will only take a constant amount of time.

We’re going to try a lot of stuff we can to make it more likely we achieve this beautiful state ☺.

44 of 54

In-practice situations for separate chaining

  •  

CSE 373 20 SP – CHAMPION & CHUN

Operation

Array w/ indices as keys

put(key,value)

best

𝚹(1)

In-practice

𝚹(1)

worst

𝚹(n)

get(key)

best

𝚹(1)

In-practice

𝚹(1)

worst

𝚹(1)

remove(key)

best

𝚹(1)

In-practice

𝚹(1)

worst

O(n)

Reminder: the in-practice runtimes are assuming an even distribution of the keys inside the array and following of best-practices to ensure the average chain length is low.

45 of 54

Best practices (pay attention to this for the hw)

  • what about resizing?
    • for data structures like ArrayMap or ArrayList or ArrayStack we had to resize when we’re full just because we couldn’t store any more things! But our Separate Chaining Hash Map is a little bit different: we aren’t ever forced to resize our main array, since the buckets are flexible size.

CSE 373 20 SP – CHAMPION & CHUN

It turns out we still want to resize “every so often” to make sure the average/expected length of each bucket is a small number.

Consider what happens if we had the array length 10 like on the left, but had 100 key-value pairs?

Assuming our in-practice niceness (not-worst case) you would expect on average each of the 10 buckets has about 10 key-value pairs in it.

What happens if we stick with the same size array but add 100 more key-value pairs? Each bucket gets about 10 more –key-value pairs and the runtime is getting worse and worse.

46 of 54

Best practices (pay attention to this for the hw)

CSE 373 20 SP – CHAMPION & CHUN

It turns out we still want to resize “every so often” to make sure the average/expected length of each bucket is a small number.

Consider what happens if we had the array length 10 like on the left, but had 100 key-value pairs?

Assuming our in-practice niceness (not-worst case) you would expect on average each of the 10 buckets has about 10 key-value pairs in it.

What happens if we stick with the same size array but add 100 more key-value pairs? Each bucket gets about 10 more –key-value pairs and the runtime is getting worse and worse.

The pattern we’re getting to is that the expected runtime is approximately: # of pairs / array.length (AKA n / c where n is the number of elements and c is the number of possible chains). If array.length is fixed for your whole program, then this is an order-n runtime, but if the array.length also increases (because you re-size) and you redistribute out the values evenly across the buckets, you can keep your runtime low. In particular, if you resize when when your n / c ratio increases to about 1, you’re expected to have 1 element or fewer in each bucket at all times. (do this on your homework).

Tip: make sure you re-hash (re-distribute) your keys by the new array length after re-sizing so they don’t get clustered in the old array length range.

47 of 54

Lambda + resizing rephrased

  • To be more precise, the in-practice runtime depends on λ, the current average chain length.

However, if you resize once you hit that 1:1 threshold, the current λ is expected to be less than 1 (which is a constant / constant runtime, so we can simplify to O(1)).

CSE 373 SU 19 - ROBBIE WEBER

47

Operation

Array w/ indices as keys

put(key,value)

best

O(1)

In-practice

O(λ)

worst

O(n)

get(key)

best

O(1)

In-practice

O(λ)

worst

O(n)

remove(key)

best

O(1)

In-practice

O(λ)

worst

O(n)

 

48 of 54

What about non integer keys?

  •  

CSE 373 SU 19 - ROBBIE WEBER

48

Hash function definition

hash function is any function that can be used to map data of arbitrary size to fixed-size values.

49 of 54

(Before we % by length, we have to convert the data into an int)

  • Implementation 1: Simple aspect of values
  • public int hashCode(String input) {
  • return input.length();
  • }
  • Implementation 2: More aspects of value
  • public int hashCode(String input) {
  • int output = 0;
  • for(char c : input) {
  • out += (int)c;
  • }
  • return output;
  • }
  • Implementation 3: Multiple aspects of value + math!
  • public int hashCode(String input) {
  • int output = 1;
  • for (char c : input) {
  • int nextPrime = getNextPrime();
  • out *= Math.pow(nextPrime, (int)c);

}

  • return Math.pow(nextPrime, input.length());
  • }

CSE 373 SU 19 - ROBBIE WEBER

49

Pro: super fast

Con: lots of collisions!

Pro: still really fast

Con: some collisions

Pro: few collisions

Con: slow, gigantic integers

50 of 54

Java’s hashCode (relevant for project)

  • Luckily, most of these design decisions have been made for us by smart people. All objects in java come with a `hashCode()` method that does some magic (see previous slide for the not-magic version) to turn any object type (like String, ArrayList, Point, Scanner) into an integer. These hashCodes are designed to distribute pretty evenly / not have lots of collisions, so we use them as the starting point for determining the bucket index.

  • high level steps to figure out which bucket a key goes into
    • call the key.hashCode() to get an int representation of the object
    • % by the array table length to convert it to a valid index for your hash map

CSE 373 20 SP – CHAMPION & CHUN

51 of 54

Best practices for an nice distribution of keys recap

  • resize when lambda (number of elements / number of buckets) increases up to 1
  • when you resize, you can choose a the table length that will help reduce collisions if you multiply the array length by 2 and then choose the nearest prime number
  • design the hashCode of your keys to be somewhat complex and lead to a distribution of different output numbers

CSE 373 20 SP – CHAMPION & CHUN

52 of 54

Practice

  • Consider an IntegerDictionary using separate chaining with an internal capacity of 10. Assume our buckets are implemented using a LinkedList where we append new key-value pairs to the end.
  • Now, suppose we insert the following key-value pairs. What does the dictionary internally look like?
  • (1, a) (5,b) (11,a) (7,d) (12,e) (17,f) (1,g) (25,h)

CSE 373 SU 19 - ROBBIE WEBER

52

0

1

2

3

4

5

6

7

8

9

(1, a)

(5, b)

(11, a)

(17, f)

(1, g)

(12, e)

(7, d)

(25, h)

53 of 54

Practice

  • Consider a StringDictionary using separate chaining with an internal capacity of 10. Assume our buckets are implemented using a LinkedList. Use the following hash function:
  • public int hashCode(String input) {
  • return input.length() % arr.length;
  • }
  • Now, insert the following key-value pairs. What does the dictionary internally look like?
  • (“a”, 1) (“ab”, 2) (“c”, 3) (“abc”, 4) (“abcd”, 5) (“abcdabcd”, 6) (“five”, 7) (“hello world”, 8)

CSE 373 SU 19 - ROBBIE WEBER

53

0

1

2

3

4

5

6

7

8

9

(“a”, 1)

(“abcd”, 5)

(“c”, 3)

(“five”, 7)

(“abc”, 4)

(“ab”, 2)

(“hello world”, 8)

(“abcdabcd”, 6)

54 of 54

Java and Hash Functions

  • Object class includes default functionality:
    • equals
    • hashCode

  • If you want to implement your own hashCode you should:
    • Override BOTH hashCode() and equals()

If a.equals(b) is true then a.hashCode() == b.hashCode() MUST also be true

  • That requirement is part of the Object interface. �Other people’s code will assume you’ve followed this rule.
  • Java’s HashMap (and HashSet) will assume you follow these rules and conventions for your custom objects if you want to use your custom objects as keys.

CSE 373 SU 19 - ROBBIE WEBER

54