CSE 163
Pandas�
Suh Young Choi�
🎶 Listening to: Death’s Door soundtrack
💬 Before Class: What is the best polygon?
This Time
Last Time
2
Announcements
3
Checking in with the Check-ins
Lost in translation?
Concerned about falling behind?
4
What about grading and the final project?
5
Recap
Imports
Pandas
6
Importing
Importing lets you use the contents defined in another Python file
7
# Method 1: Import module
import module
module.function()
# Method 1: Import module
import module
module.function()
# Method 2: Import and rename module
import module as m
m.function()
# Method 1: Import module
import module
module.function()
# Method 2: Import and rename module
import module as m
m.function()
# Method 3: Import specific function from module
from module import function
function()
DataFrame
8
| id | year | month | day | latitude | longitude | name | magnitude |
0 | nc72666881 | 2016 | 7 | 27 | 37.672333 | -121.619000 | California | 1.43 |
1 | us20006i0y | 2016 | 7 | 27 | 21.514600 | 94.572100 | Burma | 4.90 |
2 | nc72666891 | 2016 | 7 | 27 | 37.576500 | -118.859167 | California | 0.06 |
Columns
Index (row)
Series
9
0 | California |
1 | Burma |
2 | California |
df['name']
df['name'][1] # 'Burma'
Series Operations
10
0 | California :) |
1 | Burma :) |
2 | California :) |
df['name’] += “ :)”
df[‘magnitude’] *= 2
0 | 2.86 |
1 | 9.80 |
2 | 0.12 |
Series Operations
11
False |
True |
False |
0 | California |
1 | Burma |
2 | California |
df['name’] == “Burma”
Series Operations
12
False |
True |
False |
0 | California |
1 | Burma |
2 | California |
df['name’] == “Burma”
Series Operations
13
df['name’] == “Burma”
| id | year | month | day | latitude | longitude | name | magnitude |
0 | nc72666881 | 2016 | 7 | 27 | 37.672333 | -121.619000 | California | 1.43 |
1 | us20006i0y | 2016 | 7 | 27 | 21.514600 | 94.572100 | Burma | 4.90 |
2 | nc72666891 | 2016 | 7 | 27 | 37.576500 | -118.859167 | California | 0.06 |
False |
True |
False |
Filtering
�
14
mask = df['magnitude'] > 5
df[mask]
# Same as: data[data['magnitude'] > 5]
| id | year | month | day | latitude | longitude | name | magnitude |
30 | us20006i18 | 2016 | 7 | 27 | -24.286000 | -67.864700 | Chile | 5.60 |
114 | us20006i35 | 2016 | 7 | 27 | 36.492200 | 140.756800 | Japan | 5.30 |
421 | us1000683b | 2016 | 7 | 28 | -16.824200 | -172.515800 | Tonga | 5.10 |
df[(df['magnitude'] > 5) & ~(df['day'] == 27)]
Location
How to access data in pandas
Series
DataFrame
Options for indexers:
Remember the end of a slice is inclusive unlike Python’s standard
15
df[<indexer>]
df.loc[<row indexer>, <column indexer>]
series[<indexer>]
Before Next Time
Next Time
16