1 of 29

Recitation 8:

Hashing

CIT 5940

March 29, 2023

2 of 29

Attendance Code:

47638

3 of 29

Announcements

  • HW5 due next Wednesday, April 3rd @ 11:59PM ET
  • Recitation due March 30th @ 11:59 PM
  • Final projects on the horizon: look forward to details in the coming weeks
  • Start considering groups/project ideas
  • Ed Post to help with project group formation

4 of 29

(Quick Recap) Open Hashing

5 of 29

Open Hashing

  • It is okay to hash two keys to the same position, just link it up
  • Uses array of linked lists to resolve the collision
  • Insert at head of the linked list to save time

Relatively straight forward, right?

6 of 29

Closed Hashing

Confusingly also called “Open Addressing”

7 of 29

Closed Hashing

  • A hash system where all records are stored 1 slot each inside the hash table
  • If a collision occurs, find another slot to put it in
  • Need some kind of collision resolutions policy to deal with collisions
    • No longer any access to “infinite space” via linked lists

8 of 29

Load Factor & Rehashing

Load Factor: The ratio of entries to table size before an increment in size of the data structure is required

  • Usually denoted by (m / n) where m is the number of entries, n is the number of slots
  • Default load factor is 0.75, why is this?

Rehashing: Technique where the table is resized (doubled)

  • Time Complexity: O(n)
  • Space Complexity: O(n)

9 of 29

Closed Hashing with Buckets

  • M slots, B buckets, each bucket with M/B slots
  • Hash function assigns a bucket which then looks for a slot inside the bucket
  • If the bucket is full, then it is sequentially added in the next bucket
  • If buckets are full, add the element to the overflow bucket (assumed infinite capacity)
  • Expensive process if many records are in the overflow bucket**

** Why is this??

10 of 29

Alternative to Buckets: Probing

Probe function: function used by a collision resolution method to calculate where to look next in the hash table

If a collision occurs, the probe function will traverse to the next slot as determined by the function starting from a home slot.

(home + p(key, i)) % M

home = hash(key)

p(key, i) = probe function

M = capacity of hash table

11 of 29

No Buckets: Linear Probing/Probing by Steps

Linear probing:

  • p(key, i) = i
  • Risk of primary clustering (empty slots not having same probability to be mapped to)

Linear probing by Steps:

  • p(key, i) = i * c
  • Like linear probing, but take larger steps equal to c
  • Help with mitigating the risk of primary clustering

12 of 29

No Buckets: Pseudo-Random probing and Quadratic

Pseudo-Random probing

  • p(key, i) = Permutation[i]
  • Permutation list: Stores a random permutation of the values from 1 to M − 1 in slots 1 to M − 1

Quadratic probing

  • p(key, i) = c1i + c2i2

(one variant)

  • Risk of secondary clustering: records with same home slots share probing sequence

* Easy to visualize *

* There’s a small problem with this one. What is it?

13 of 29

No Buckets: Double Hashing

  • p(key, i) = i * hash2(key)
    • hash2(key) = prime - (hash1(key) % prime)
  • Avoid secondary clustering
    • Secondary clustering: clustering in a separate location, away from the original hash location
  • Value returned by h2 cannot be 0 or M or it will result in an infinite loop.

14 of 29

Double Hashing Example

h1(k) = k % 11 (first hash function)

h2(k) = 8 - (k % 8) (second hash function)

Steps:

Step 1: K = 20, h1(20) = 20 mod 11 = 9 No collision occurs.

Step 2: K = 34, h1(34) = 34 mod 11 = 1, No collision occurs.

Step 3: K = 45, h1(45) = 45 mod 11 = 1, Collision occur because index 1 is already occupied by 34. Now we will use the second hash function to calculate the index for the key 45.

h2(45) = 8 - (45 mod 8) = 3

h(45, 1) = (1 + 1 * 3) mod 11 = 4

15 of 29

Let’s Play:

GUESS THAT COLLISION!

16 of 29

Which probe function is being used?

Insertions = {24, 19, 34, 9, 4}

34

9

4

24

19

17 of 29

Which probe function is being used?

Insertions = {24, 19, 34, 9, 4}

34

9

4

24

19

Linear Probing:

pos = (hash1(key) + i) % 5

18 of 29

Which probe function is being used?

Insertions = {24, 36, 34, 8, 30 }

8

30

24

36

34

19 of 29

Which probe function is being used?

Insertions = {24, 36, 34, 8, 30 }

8

30

24

36

34

Linear Probing with Steps = 2:

pos = (hash1(key) + 2i) % 10

20 of 29

Which probe function is being used?

30

23

24

36

33

34

Insertions = {24, 36, 34, 30, 23, 33. 43. 53 }

21 of 29

Which probe function is being used?

Insertions = {24, 36, 34, 30, 23, 33, 43, 53 }

30

23

24

36

33

34

Quadratic Probing with c1 = 5 and c2 = 10:

pos = (hash1(key) + c1i + c2i2) % 10

cluster 1

cluster 2

Not added

22 of 29

Which probe function is being used?

Insertions = {24, 23, 34, 22, 32, 62, 72}

22

23

24

32

34

62

72

23 of 29

Which probe function is being used?

22

23

24

32

34

62

72

Insertions = {24, 23, 34, 22, 32, 62, 72}

Double Hashing with prime 3:

pos = (hash1(key) + i * hash2(key)) % 10

hash2(key) = 3 - (hash1(key) % 3)

24 of 29

Recitation Activity:

Now you write the probe functions!

Harry’s Note:

Over the past few lectures, we studied various implementations of hash tables. While this analysis is useful to understanding how these data structures are implemented, our discussion focused solely on implementation details at the expense of any practice in using them. This activity will help remedy that and also serves as a preview for our upcoming discussion of Indexing. In completing this lab, you will develop a program that organizes data by two different keys. Reflect on how we adapt to the limitations of Maps by adding another level of indirection to our program.

25 of 29

Collision Resolution Policy: Strategy Pattern

<<interface>>

CollisionResolutionPolicy

probe() & setup()

LinearProbingCRP

LinearProbingStepsCRP

QuadraticProbingCRP

DoubleHashingCRP

Ignore (if present)

SUBMIT

SUBMIT

SUBMIT

SUBMIT

26 of 29

Important/ Runnable Files

Ignore (if present)

OpenAddressingBucket/HashTable

  • Contains buckets and a hash table that uses the probe methods
  • *add import java.util.Arrays at the top of OpenAddressingHashTable to fix an error

Experiment:

  • Run this to see how the probing methods work to improve insertion and search on large hash tables

HashingTest:

  • Run this to test your probing methods

SUBMIT

SUBMIT

SUBMIT

SUBMIT

27 of 29

TODOS:

QuadraticProbingCRP has been filled out for you to reference how make a probing function.

Fill out the probing functions in:

  • LinearProbingCRP
  • LinearProbingStepsCRP
  • DoubleHashingCRP

Run HashingTest to test if they work properly.

Ignore (if present)

SUBMIT

SUBMIT

SUBMIT

SUBMIT

28 of 29

Experiments and Observations

  • You can run the experiment file to see the difference between the probing methods.
  • Each experiment inserts 7500 keys into a 10000 capacity hash table, then averages the number of probes it takes to insert and search

29 of 29

Attendance Code:

47638