1 of 15

Lecture 10

Groups

DATA 8

Fall 2018

Slides created by John DeNero (denero@berkeley.edu) and Ani Adhikari (adhikari@berkeley.edu)

2 of 15

Announcements

3 of 15

Project 1: World Progress

4 of 15

Example: Prediction

5 of 15

Apply with Multiple Columns

6 of 15

From last time: discussion Q

You have data about daily temperatures as shown. Which type of chart would you use to answer each question?

  • Are there more cloudy than�sunny days?
  • What percentage of days�have a high above 72º?
  • Did many days have�a difference of more�than 20 degrees�between their high &�low temperatures?

7 of 15

Apply

The apply method creates an array by calling a function on every element in one or more input columns

  • First argument: Function to apply
  • Other arguments: The input column(s)

table_name.apply(one_arg_function, 'column_label')

table_name.apply(two_arg_function,

'column_label_for_first_arg',

'column_label_for_second_arg')

apply called with only a function applies it to each row

(Demo)

8 of 15

Grouping by One Attribute

9 of 15

Grouping by One Column

The group method aggregates all rows with the same value for a column into a single row in the resulting table.

  • First argument: Which column to group by
  • Second argument: (Optional) How to combine values
    • len — number of grouped values (default)
    • list — list of all grouped values
    • sum — total of all grouped values

(Demo)

10 of 15

Cross-Classification

11 of 15

Grouping By Multiple Columns

The group method can also aggregate all rows that share the combination of values in multiple columns

  • First argument: A list of which columns to group by
  • Second argument: (Optional) How to combine values

(Demo)

12 of 15

Pivot Tables

13 of 15

Pivot

  • Cross-classifies according to two categorical variables
  • Produces a grid of counts or aggregated values
  • Two required arguments:
    • First: variable that forms column labels of grid
    • Second: variable that forms row labels of grid
  • Two optional arguments (include both or neither)
    • values=’column_label_to_aggregate’
    • collect=function_to_aggregate_with

(Demo)

14 of 15

Challenge Question

Which NBA teams spent the most on their “starters” in 2015-2016?

Assume the “starter” for a team & position is the player with the highest salary on that team in that position.

(Demo)

15 of 15

Take-Home Question

Generate a table of the names of the starters for each team