Sets
Jake Shoudy
Oct 12, 2022
CSCI 110 - Lecture 22
Announcements
Project 2: Part 1
Build a simple search engine using 1D lists :)
Due Friday October 14th, 11:59pm
HW8
Posted on Ed
Big O Practice and Coding problems using Sets
Due next Wednesday (Oct 19th)
Exam 2
2 weeks from today!
Wednesday 10/26
Will cover all material through Dictionaries (Friday) but with a focus on material since midterm #1
Quiz 6 Scores
| Score |
Mean | 75% |
Min | 20% |
Max | 110% |
Median | 70% |
Percent of class above 70% | 52% |
Percent of class above 90% | 28% |
Recap
Practice: Big O
def mystery(my_list):
copy_list = my_list.copy()
for item in copy_list:
my_list.append(item)
for item in copy_list
my_list.remove(item)
return my_list
O(n)
Time Complexity:
O(n) + O(n)*O(1) + O(n)*O(n)
= O(n) + O(n) + O(n^2)
= O(2n + n^2)
= O(n^2)
O(n)
Space complexity:
Copy of list is O(n). Also double the input list at some point. Still…
O(n)
O(1)
O(n)
O(n)
O(1)
Sets
Motivation
Let’s say it’s required to attend at least one office hour a semester, and we need to keep track of which students have attended.
Office Hours 1:
Office Hours 2:
Motivation
Issue? I don’t care if Sarah attended twice or when they attended...I’m only interested in knowing if they attended at least once.
Possible solution:
But this takes up memory and time...
What if we had thousands of students?
Solution
Sets solve this problem for us.
They are useful when we don’t care about ordering or repetition.
Sets cannot have duplicates and do not have a defined order!
Additional Characteristics
How? Hashing! Learn about this in the future.
Syntax: Option 1
Curly braces {}
Example:
years = {1995, 2020, 1992, 1920, 1776, 2020}
=> years will be {1995, 2020, 1992, 1920, 1776}
Syntax: Option 2
set(a) where a is a data type that can be iterated over
Example:
word = set('helloworld')
=> word will be {'h', 'e', 'l', 'o', 'w', 'r', 'd'}
items = set(['couch', 'pillow', 'couch', 'lamp', 'lamp', 'clock'])
=> items will be {'couch', 'pillow', 'lamp', 'clock'}
Empty Set
Has to be:
new_set = set()
Cannot be:
new_set = {}
{} creates a new dictionary, not a set (we’ll learn about these on Friday!)
Immutable Items Only
Sets must contain immutable items only
That is because they store items in memory using something called “hashing” and it depends on the values not changing.
Can put strings, ints, floats, booleans inside of sets
But not lists (or other sets)
Set operations
in operator
in operator: check whether value exists in set
On average runs in O(1) (constant time!)
years = {1995, 2020, 1992, 1920, 1776}
1995 in years
2019 in years
True
False
Mutation: Add an item
set.add(x)
Practice: Add an item
years = {1995, 2020, 1992, 1920, 1776}
years.add(2000)
len(years)
years.add(1995)
len(years)
years.add(1972)
len(years)
{1995, 2020, 1992, 1920, 1776, 2000}
6
{1995, 2020, 1992, 1920, 1776, 2000}
6
{1995, 2020, 1992, 1920, 1776, 2000, 1972}
7
Mutation: Delete an item
set.remove(x)
Practice: Delete an item
years = {1995, 2020, 1992, 1920, 1776}
years.remove(2020)
len(years)
years.remove(2020)
len(years)
years.remove(1995)
len(years)
{1995, 1992, 1920, 1776}
4
KeyError
4
{1992, 1920, 1776}
3
Mutation: Delete a random item
set.pop()
Why?
Could be useful if have only one item left in set,
or if you want to use every item in the set one at a time (like drawing from a hat).
Practice: Delete a random item
years = {1995, 2020}
years.pop()
len(years)
years.pop()
len(years)
years.pop()
returns 2020, years = {1995}
1
returns 1995, years = {}
0
KeyError
Calculate: Set Math
s1.union(s2)
s1.intersection(s2)
s1.difference(s2)
Note: none of these functions mutate the set
Practice: Set Math
my_food = {'coffee', 'grapes'}
your_food = {'bananas', 'grapes'}
print(my_food.union(your_food))
print(my_food.intersection(your_food))
print(my_food.difference(your_food))
print(your_food.difference(my_food))
{'coffee', 'grapes', 'bananas'}
{'grapes'}
{'coffee'}
{'bananas'}
'coffee'
'grapes'
'bananas'
Iteration
Can iterate through items of a set using for loop, just like lists
Runs in O(n), still have to look at every element
No guaranteed order!
years = {1995, 2020, 1992, 1920, 1776}
for year in years:
print(year)
1995
1992
1776
2020
1920
1992
1920
1995
2020
1776
Could print out this:
Or this:
1776
2020
1920
1995
1992
Or this:
...or some order
Accumulator Pattern
Often used to accumulate or count unique items in given data structure
sentence = 'hello how are you today'
result = set()
for letter in sentence:
result.add(letter)
print(result)
print(len(result))
{'h','e','l','o',' ','w','a','r','y','u','t','d',}
12
Sets vs Lists
How are sets similar to lists?
How are sets different than lists?
Let’s Code!
Sets
What?
Unordered collection of immutable (integer, float, boolean, string) values
Same value can only be stored once
Can find len(), use in operator
Set functions: add, remove, pop
Why?
Contain data that can only have one entry per value
Identify / remove duplicates
Questions?