1 of 8

CSE 163

Hashing

��Hunter Schafer

💬Before Class: Favorite Tree

🎵Music: Lorde

2 of 8

Data Structures

  • List: Series of values, each with an index
    • list.append(v) O(1)
    • v in list O(n)
    • list[i] O(1)�
  • Set: Collection of unique values, no sense of “order”
    • set.add(v) O(1)
    • v in set O(1)
    • set[i] Not allowed�
  • Dictionary: Like a set, but maps distinct keys to values
    • dict[k] = v O(1)
    • k in dict O(1)
    • dict[k] O(1)

2

How can the set/dictionary have run-times that don’t depend on the number of elements in the data structure?

3 of 8

Set of Ints

  • Hash Function: h(v) = v % 10

3

nums = set()

nums.add(11) # 11 % 10 == 1

nums.add(49) # 49 % 10 == 9

nums.add(24) # 24 % 10 == 4

nums.add(7) # 7 % 10 == 7

print(49 in nums) # True

print(5 in nums) # False

0

0

0

0

0

0

0

1

2

3

4

5

6

7

8

9

0

0

0

11

0

49

24

7

4 of 8

Chained Hashing

  • If we experience a collision, just store both values at that index
  • Under some reasonable assumptions, these buckets will be small compared to the size of the whole set of items
  • Care about the load-factor (num items / size of table)
    • If the load-factor is too high, these buckets will be big
    • Can rehash into a larger table to lower load-factor

4

5 of 8

Hash Functions

  • Our hash table can handle collisions, but we still want to keep the number of collisions low (otherwise buckets are large)
  • Integers are relatively easy to hash, but what about strings?
  • Each object can implement a __hash__ method
    • Write code to turn your object into an int for indexing�
  • Requirements of hash function
    • Needs to return a number
    • Be “consistent”
  • What makes a good hash function
    • Should minimize the number of collisions
    • Should “look random” to maximize spread of values

5

6 of 8

Hashing in Practice

  • Hashing and equality are deeply related
    • If two values are equal, they must hash to same value
  • If you implement __eq__ and __hash__,�they need to be consistent
  • Generally, people let Python make a good hash function

6

class Example:

def __init__(self, a, b, c):

# Initialize _a, _b, _c

def __eq__(self, other):

return self._a == other._a \

and self._b = other._b

def __hash__(self):

return hash((self._a, self._b))

7 of 8

Count-Min Sketch

  • When we receive an item we look it up in all tables
  • What result should we return?

7

8 of 8

Group Work:

Best Practices

When you first working with this group:

  • Introduce yourself!
  • If possible, angle one of your screens so that everyone can discuss together

Tips:

  • Starts with making sure everyone agrees to work on the same problem
  • Make sure everyone gets a chance to contribute!
  • Ask if everyone agrees and periodically ask each other questions!
  • Call TAs over for help if you need any!

8