1 of 17

CSE 163

Section XX

TA 1 & TA 2

Question of the Day: Would you rather fight 1 bowser sized duck or 10 duck sized bowsers?

2 of 17

Housekeeping 🏡

Important Dates and Reminders

  • Homework 3: Education due tonight!
  • Project Group Finding Form due tonight at 11:59pm!
  • Section Assignment 4 due tomorrow at noon!
  • Reading Assignment 3 due Monday!

3 of 17

Game Plan

What We’ll Cover Today

  • Review
    • Groupby
    • Indexing
    • Data visualization
  • Practice Problems
    • groupby-hierarchical-indexing-win26.ipynb

ďż˝

4 of 17

Recap

What we’ve learned so far:

Last week:

  • Groupby
  • Hierarchical indexing

This week:

  • Data visualization
    • Seaborn
    • Matplotlib

5 of 17

Groupby Demo

6 of 17

Group By

result = data.groupby('col1')['col2'].sum()

6

col1

col2

0

A

1

1

B

2

2

C

3

3

A

4

4

C

5

col2

C

3

5

col2

B

2

col2

A

1

4

A

5

B

2

C

8

A

5

B

2

C

8

Data�DataFrame

Split

Apply

Combine�Series

7 of 17

Hierarchical Indexing

8 of 17

Hierarchical Indexing

  • DataFrames can have a hierarchical (MultiIndex) index.
    • Each combination of index level values uniquely identifies a row.
  • You can access rows using .loc[].
    • To select specific index levels, pass a tuple as the row indexer (the first argument to .loc[]), with one element per index level.
    • Use slice(None) inside the tuple (instead of : a colon) to select all values at a given level.

# Create MultiIndex

df = pd.read_csv('cats.csv')

cats = df.set_index(['id','age']).sort_index()

# Finding weights of 5 year old cats

weights = cats.loc[(slice(None), 5), 'weight']

9 of 17

Data Viz

10 of 17

Libraries!

Libraries allow for greater functionality beyond the Python defaults.

  • Contains objects and functions that vanilla Python doesn’t!
  • Always remember your import statements

import pandas as pd

import matplotlib.pyplot as plt

import seaborn as sns

11 of 17

Data Visualization

Using Seaborn and Matplotlib – very easy out-of-the-box

Seaborn

  • Specify x, y, and data
    • More arguments for specializations (hue, shape)
  • .relplot, .catplot…

Matplotlib

  • Seaborn is built on mpl
  • Use plt to access figure customizations
    • .xlabel, .ylabel, .title

# Simple line plot

df = pd.read_csv('cats.csv')

sns.relplot(x='age', y='weight', data=df)

# Customizing with mpl

plt.xlabel('Age (cm)')

plt.ylabel('Weight (g)')

plt.title('Avg Cat Weight with Age')

import matplotlib.pyplot as plt

import seaborn as sns

12 of 17

Axes vs. FacetGrid

What’s the difference?

13 of 17

Axes

  • An Axes is an object that represents one plotting area
  • Use them when you want one figure (e.g. one plot where multiple figures are overlaid)

14 of 17

FacetGrid

  • A FacetGrid is a seaborn-specific object that creates multiple Axes
    • relplot()
    • catplot()
  • Axes = one whiteboard
  • FacetGrid = a classroom wall of whiteboards, one per group
    • You still draw on each whiteboard (Axes), but the FacetGrid decides how many there are and how they’re arranged.

15 of 17

Practice Problems!

16 of 17

Section code:

17 of 17

Solutions

—---------------------------------------------------------------------------------

—---------------------------------------------------------------------------------