1 of 33

Sets

Jake Shoudy

Oct 12, 2022

CSCI 110 - Lecture 22

2 of 33

Announcements

3 of 33

Project 2: Part 1

Build a simple search engine using 1D lists :)

Due Friday October 14th, 11:59pm

4 of 33

HW8

Posted on Ed

Big O Practice and Coding problems using Sets

Due next Wednesday (Oct 19th)

5 of 33

Exam 2

2 weeks from today!

Wednesday 10/26

Will cover all material through Dictionaries (Friday) but with a focus on material since midterm #1

6 of 33

Quiz 6 Scores

Score

Mean

75%

Min

20%

Max

110%

Median

70%

Percent of class above 70%

52%

Percent of class above 90%

28%

7 of 33

Recap

8 of 33

Practice: Big O

def mystery(my_list):

copy_list = my_list.copy()

for item in copy_list:

my_list.append(item)

for item in copy_list

my_list.remove(item)

return my_list

O(n)

Time Complexity:

O(n) + O(n)*O(1) + O(n)*O(n)

= O(n) + O(n) + O(n^2)

= O(2n + n^2)

= O(n^2)

O(n)

Space complexity:

Copy of list is O(n). Also double the input list at some point. Still…

O(n)

O(1)

O(n)

O(n)

O(1)

9 of 33

Sets

10 of 33

Motivation

Let’s say it’s required to attend at least one office hour a semester, and we need to keep track of which students have attended.

Office Hours 1:

  • Sarah
  • Jordan

Office Hours 2:

  • Tyler
  • Sarah

11 of 33

Motivation

Issue? I don’t care if Sarah attended twice or when they attended...I’m only interested in knowing if they attended at least once.

Possible solution:

  • Loop through my list and ignore or remove duplicates

But this takes up memory and time...

What if we had thousands of students?

12 of 33

Solution

Sets solve this problem for us.

They are useful when we don’t care about ordering or repetition.

Sets cannot have duplicates and do not have a defined order!

13 of 33

Additional Characteristics

  • Cannot have duplicates
  • Do not have a defined order (so cannot be indexed into and sliced)
  • Mutable - just like lists!
  • Adding to a set is O(1)
  • Removing from a set is O(1)
  • The `in` operator in a set is O(1)

How? Hashing! Learn about this in the future.

14 of 33

Syntax: Option 1

Curly braces {}

Example:

years = {1995, 2020, 1992, 1920, 1776, 2020}

=> years will be {1995, 2020, 1992, 1920, 1776}

15 of 33

Syntax: Option 2

set(a) where a is a data type that can be iterated over

Example:

word = set('helloworld')

=> word will be {'h', 'e', 'l', 'o', 'w', 'r', 'd'}

items = set(['couch', 'pillow', 'couch', 'lamp', 'lamp', 'clock'])

=> items will be {'couch', 'pillow', 'lamp', 'clock'}

16 of 33

Empty Set

Has to be:

new_set = set()

Cannot be:

new_set = {}

{} creates a new dictionary, not a set (we’ll learn about these on Friday!)

17 of 33

Immutable Items Only

Sets must contain immutable items only

That is because they store items in memory using something called “hashing” and it depends on the values not changing.

Can put strings, ints, floats, booleans inside of sets

But not lists (or other sets)

18 of 33

Set operations

19 of 33

in operator

in operator: check whether value exists in set

On average runs in O(1) (constant time!)

years = {1995, 2020, 1992, 1920, 1776}

1995 in years

2019 in years

True

False

20 of 33

Mutation: Add an item

set.add(x)

  • Add element x to set
  • Runs in O(1)

21 of 33

Practice: Add an item

years = {1995, 2020, 1992, 1920, 1776}

years.add(2000)

len(years)

years.add(1995)

len(years)

years.add(1972)

len(years)

{1995, 2020, 1992, 1920, 1776, 2000}

6

{1995, 2020, 1992, 1920, 1776, 2000}

6

{1995, 2020, 1992, 1920, 1776, 2000, 1972}

7

22 of 33

Mutation: Delete an item

set.remove(x)

  • x is an item in set
  • If x does not exist in set, will raise KeyError
  • Runs in O(1)

23 of 33

Practice: Delete an item

years = {1995, 2020, 1992, 1920, 1776}

years.remove(2020)

len(years)

years.remove(2020)

len(years)

years.remove(1995)

len(years)

{1995, 1992, 1920, 1776}

4

KeyError

4

{1992, 1920, 1776}

3

24 of 33

Mutation: Delete a random item

set.pop()

  • Removes arbitrary item from set and returns item
  • If set is empty, will raise KeyError
  • Runs in O(1)

Why?

Could be useful if have only one item left in set,

or if you want to use every item in the set one at a time (like drawing from a hat).

25 of 33

Practice: Delete a random item

years = {1995, 2020}

years.pop()

len(years)

years.pop()

len(years)

years.pop()

returns 2020, years = {1995}

1

returns 1995, years = {}

0

KeyError

26 of 33

Calculate: Set Math

s1.union(s2)

  • Combines two sets
  • Runs in O(s + t) where s and t are the size of the two sets

s1.intersection(s2)

  • Finds overlap between two sets
  • Runs in O(min(s, t)) where s and t are the size of the two sets

s1.difference(s2)

  • Removes all items in s2 from s1
  • Runs in O(t) where t is the size of the first set

Note: none of these functions mutate the set

27 of 33

Practice: Set Math

my_food = {'coffee', 'grapes'}

your_food = {'bananas', 'grapes'}

print(my_food.union(your_food))

print(my_food.intersection(your_food))

print(my_food.difference(your_food))

print(your_food.difference(my_food))

{'coffee', 'grapes', 'bananas'}

{'grapes'}

{'coffee'}

{'bananas'}

'coffee'

'grapes'

'bananas'

28 of 33

Iteration

Can iterate through items of a set using for loop, just like lists

Runs in O(n), still have to look at every element

No guaranteed order!

years = {1995, 2020, 1992, 1920, 1776}

for year in years:

print(year)

1995

1992

1776

2020

1920

1992

1920

1995

2020

1776

Could print out this:

Or this:

1776

2020

1920

1995

1992

Or this:

...or some order

29 of 33

Accumulator Pattern

Often used to accumulate or count unique items in given data structure

sentence = 'hello how are you today'

result = set()

for letter in sentence:

result.add(letter)

print(result)

print(len(result))

{'h','e','l','o',' ','w','a','r','y','u','t','d',}

12

30 of 33

Sets vs Lists

How are sets similar to lists?

  • Store data
  • Have length
  • Mutable

How are sets different than lists?

  • Unique values
  • Values must be immutable in sets
  • Unordered
  • Faster runtimes!

31 of 33

Let’s Code!

32 of 33

Sets

What?

Unordered collection of immutable (integer, float, boolean, string) values

Same value can only be stored once

Can find len(), use in operator

Set functions: add, remove, pop

Why?

Contain data that can only have one entry per value

Identify / remove duplicates

33 of 33

Questions?