CSE 163
Groupby and Apply�
Suh Young Choi�
🎶 Listening to: Minecraft soundtrack
💬 Before Class: If you were a kitchen appliance, what would you be?
This Time
Last Time
2
Announcements
3
Keyword Arguments
4
def div(a, b):
return a / b
# Same behavior
div(1, 2)
div(a=1, b=2)
div(b=2, a=1)
# Different behavior
div(b=1, a=2)
DataFrame
5
| id | year | month | day | latitude | longitude | name | magnitude |
0 | nc72666881 | 2016 | 7 | 27 | 37.672333 | -121.619000 | California | 1.43 |
1 | us20006i0y | 2016 | 7 | 27 | 21.514600 | 94.572100 | Burma | 4.90 |
2 | nc72666891 | 2016 | 7 | 27 | 37.576500 | -118.859167 | California | 0.06 |
Columns
Index (row)
Groupby Demo
| col1 | col2 |
0 | A | 1 |
1 | B | 2 |
2 | C | 3 |
3 | A | 4 |
4 | C | 5 |
A | 1 |
B | 2 |
C | 3 |
A | 4 |
C | 5 |
Groupby Demo
| col1 | col2 |
0 | A | 1 |
1 | B | 2 |
2 | C | 3 |
3 | A | 4 |
4 | C | 5 |
A | 1 |
B | 2 |
C | 3 |
A | 4 |
C | 5 |
result = data.groupby(‘col1’)
Groupby Demo
| col1 | col2 |
0 | A | 1 |
1 | B | 2 |
2 | C | 3 |
3 | A | 4 |
4 | C | 5 |
A | 1 |
B | 2 |
C | 3 |
A | 4 |
C | 5 |
result = data.groupby(‘col1’)
Groupby Demo
| col1 | col2 |
0 | A | 1 |
1 | B | 2 |
2 | C | 3 |
3 | A | 4 |
4 | C | 5 |
A | 1 |
B | 2 |
C | 3 |
A | 4 |
C | 5 |
result = data.groupby(‘col1’)
Groupby Demo
| col1 | col2 |
0 | A | 1 |
1 | B | 2 |
2 | C | 3 |
3 | A | 4 |
4 | C | 5 |
A | 1 |
B | 2 |
C | 3 |
A | 4 |
C | 5 |
result = data.groupby(‘col1’)
Groupby Demo
| col1 | col2 |
0 | A | 1 |
1 | B | 2 |
2 | C | 3 |
3 | A | 4 |
4 | C | 5 |
A | 1 |
B | 2 |
C | 3 |
A | 4 |
C | 5 |
result = data.groupby(‘col1’)
A Groupby DataFrame
Groupby Demo
| col1 | col2 |
0 | A | 1 |
1 | B | 2 |
2 | C | 3 |
3 | A | 4 |
4 | C | 5 |
A | 1 |
B | 2 |
C | 3 |
A | 4 |
C | 5 |
result = data.groupby(‘col1’)[‘col2’]
col2
Groupby Demo
| col1 | col2 |
0 | A | 1 |
1 | B | 2 |
2 | C | 3 |
3 | A | 4 |
4 | C | 5 |
A | 1 |
B | 2 |
C | 3 |
A | 4 |
C | 5 |
result = data.groupby(‘col1’)[‘col2’].sum()
col2
.sum()
.sum()
.sum()
Groupby Demo
| col1 | col2 |
0 | A | 1 |
1 | B | 2 |
2 | C | 3 |
3 | A | 4 |
4 | C | 5 |
B | 2 |
C | 8 |
A | 5 |
result = data.groupby(‘col1’)[‘col2’].sum()
col2
Groupby Demo
| col1 | col2 |
0 | A | 1 |
1 | B | 2 |
2 | C | 3 |
3 | A | 4 |
4 | C | 5 |
B | 2 |
C | 8 |
A | 5 |
result = data.groupby(‘col1’)[‘col2’].sum()
Group By
result = data.groupby('col1')['col2'].sum()
16
| col1 | col2 |
0 | A | 1 |
1 | B | 2 |
2 | C | 3 |
3 | A | 4 |
4 | C | 5 |
| col2 |
C | 3 |
5 |
| col2 |
B | 2 |
| col2 |
A | 1 |
4 |
A | 5 |
B | 2 |
C | 8 |
A | 5 |
B | 2 |
C | 8 |
Data�DataFrame
Split
Apply
Combine�Series
Apply
17
data['name'].str.len()
data['name'].str.upper()
data['name'].apply(len)
data['name'].apply(my_function)
Group Work:
Best Practices
When you first working with this group:
Tips:
18
Before Next Time
Next Time
19