1 of 10

Lecture 12

Table Examples

DATA 8

Fall 2023

2 of 10

Announcements

  • No Lab Notebook this week
  • My office hours today will be from 7 pm - 8pm
  • Homework 4 due Wednesday 9/22 at 11pm
  • Project 1 due Friday 9/29 at 11pm
    • Checkpoint due Friday 9/22 at 11pm
      • Incorrect due date in project notebook
    • Early submission due Thursday 9/28 at 11pm
  • Midterm accommodations form released after lecture
  • Midterm is Friday Oct 13 7pm-9pm (not during lecture!)

3 of 10

Table Review

4 of 10

Important Table Methods

t.select(column, …) or t.drop(column, …)

t.take([row_num, …]) or t.exclude([row_num, …])

t.sort(column, descending=False)

t.where(column, are.condition(...))

t.apply(function_name, column, …)

t.group(column) or t.group(column, function_name)

t.group([column, …]) or t.group([column, …], function_name)

t.pivot(cols, rows) or t.pivot(cols, rows, vals, function_name)

t.join(column, other_table, other_table_column)

5 of 10

Table Practice

6 of 10

Join for Value Annotation

(Demo)

One common use of t.join(_, u, _):

  • A dataset t has a categorical variable x.
  • A table u has one row per possible value of x that describes some properties of that value.
  • The joined table has the same rows as t, but each row in t is now annotated with the properties of its x value.

7 of 10

Pivot

  • Cross-classifies according to two categorical variables
  • Produces a grid of counts or aggregated values
  • Two required arguments:
    • First: variable that forms column labels of the grid
    • Second: variable that forms row labels of the grid
  • Two optional arguments (include both or neither)
    • values=’column_label_to_aggregate’
    • collect=function_to_aggregate_with

8 of 10

Joining Two Tables

Drink

Cafe

Price

Milk Tea

Asha

5.5

Espresso

Strada

1.75

Latte

Strada

3.25

Espresso

FSM

2

Coupon

Location

10%

Asha

25%

Strada

5%

Asha

drinks

discounts

Cafe

Drink

Price

Coupon

Asha

Milk Tea

5.5

10%

Asha

Milk Tea

5.5

5%

Strada

Espresso

1.75

25%

Strada

Latte

3.25

25%

drinks.join('Cafe', discounts, 'Location')

Match rows in this table ...

… using values in this column ...

… with rows in that table ...

… using values in that column.

The joined column is sorted automatically

Columns from both tables

9 of 10

Group or Pivot?

  • When to Group:
    • aggregates of one categorical variable
    • aggregates of many variables
    • Multiple outputs (aggregate columns)

  • When to Pivot:
    • Aggregates of exactly two variables
    • Few unique values for column variable
    • Interested in every combination of values

10 of 10

Bike Sharing in SF Bay Area

(Demo)

Hourly bike sharing in the Bay Area began with a pilot program in 2014-2015 that produced a public dataset.

  • The SF Metropolitan Transportation Commission organized an Open Data Challenge in which participants visualized the dataset in interesting ways.

by Bjorn Vermeersch