Lecture 11
Pivots and Joins
DATA 8
Fall 2023
Announcements
Review: Grouping and Functions
Grouping and Collection
A
3
B
1
C
4
A
1
B
5
C
9
A
2
C
5
12
7
3
2
7
3
8
6
A
3
A
1
A
2
B
1
B
5
C
4
C
9
C
5
group
12
2
8
7
7
3
3
6
A
1
2
B
1
7
C
4
3
collect function (e.g., min)
Condenses results back into one table
(Demo)
Lists
Lists are Generic Sequences
A list is a sequence of values (just like an array), � but the values can have different types
[2+3, 'four', Table().with_column('K', [3, 4])]
(Demo)
Cross-Classification
Grouping By Multiple Columns
The group method can also aggregate all rows that share the combination of values in multiple columns
Group by one column (and minimize each other column): �t.group(“label”, collect=min) or t.group(“label”, min)
Group by two columns (and minimize each other): �t.group([“label 1”, “label 2”], collect=min) or�t.group([“label 1”, “label 2”], min)
(Demo)
Pivot Tables
Pivot
(Demo)
Let’s Practice
How to solve a table manipulation problem:
Discussion Question
(Demo)
sky
Group or Pivot?
Joins
Joining Two Tables
Drink | Cafe | Price |
Milk Tea | Asha | 5.5 |
Espresso | Strada | 1.75 |
Latte | Strada | 3.25 |
Espresso | FSM | 2 |
Coupon | Location |
10% | Asha |
25% | Strada |
5% | Asha |
drinks
discounts
Cafe | Drink | Price | Coupon |
Asha | Milk Tea | 5.5 | 10% |
Asha | Milk Tea | 5.5 | 5% |
Strada | Espresso | 1.75 | 25% |
Strada | Latte | 3.25 | 25% |
drinks.join('Cafe', discounts, 'Location')
Match rows in this table ...
… using values in this column ...
… with rows in that table ...
… using values in that column.
The joined column is sorted automatically
(Demo)
Columns from both tables
Table Review
Important Table Methods
t.select(column, …) or t.drop(column, …)
t.take([row_num, …]) or t.exclude([row_num, …])
t.sort(column, descending=False)
t.where(column, are.condition(...))
t.apply(function_name, column, …)
t.group(column) or t.group(column, function_name)
t.group([column, …]) or t.group([column, …], function_name)
t.pivot(cols, rows) or t.pivot(cols, rows, vals, function_name)
t.join(column, other_table, other_table_column)
Extra Example
Challenge Question
Generate a table of the names of the oldest buildings for each material for each city: