1 of 16

CSE 163

Pandas�

Suh Young Choi�

🎶 Listening to: Death’s Door soundtrack

💬 Before Class: What is the best polygon?

2 of 16

This Time

  • Imports
  • Pandas!
    • How to read a file
    • Accessing columns
    • Element-wise operators
    • Filtering
    • loc

Last Time

  • Dictionary Methods
    • How to loop over dictionary
  • CSVs
  • List of Dictionaries

2

3 of 16

Announcements

  • Data Exploration + LR 1 due tonight!

  • THA 1 due on Thursday!

3

4 of 16

Checking in with the Check-ins

Lost in translation?

  • Pseudocode might help get ideas out before grammar
  • For Java users, check out Java 2 Python 3 by Hunter Schafer!

Concerned about falling behind?

  • Reaching out might be scary, but having no help could be scarier!
  • Office hours, Ed discussion board (you can be private or anonymous!)
  • In-class and section announcements to track due dates

4

5 of 16

What about grading and the final project?

5

6 of 16

Recap

Imports

  • A way for us to get functionality from modules we didn’t write ourselves!

Pandas

  • New data types– DataFrames and Series
  • More tabular-looking than the list of dictionaries

6

7 of 16

Importing

Importing lets you use the contents defined in another Python file

  • We call a Python file a module
  • Generally there are three ways to import!

7

# Method 1: Import module

import module

module.function()

# Method 1: Import module

import module

module.function()

# Method 2: Import and rename module

import module as m

m.function()

# Method 1: Import module

import module

module.function()

# Method 2: Import and rename module

import module as m

m.function()

# Method 3: Import specific function from module

from module import function

function()

8 of 16

DataFrame

  • One of the basic data types from pandas is a DataFrame
    • It’s essentially a table with column and rows!

8

id

year

month

day

latitude

longitude

name

magnitude

0

nc72666881

2016

7

27

37.672333

-121.619000

California

1.43

1

us20006i0y

2016

7

27

21.514600

94.572100

Burma

4.90

2

nc72666891

2016

7

27

37.576500

-118.859167

California

0.06

Columns

Index (row)

9 of 16

Series

  • A Series is like a 1-dimensional DataFrame (no columns!)
    • Has an index
    • It’s basically like a fancy dictionary/list hybrid
  • For example

9

0

California

1

Burma

2

California

df['name']

df['name'][1] # 'Burma'

10 of 16

Series Operations

  • You can do Series-wide operations without loops!

10

0

California :)

1

Burma :)

2

California :)

df['name’] += “ :)”

df[‘magnitude’] *= 2

0

2.86

1

9.80

2

0.12

11 of 16

Series Operations

  • A leading question: What does this operation turn into?

11

False

True

False

0

California

1

Burma

2

California

df['name’] == “Burma”

12 of 16

Series Operations

  • A leading question: What does this operation turn into?

12

False

True

False

0

California

1

Burma

2

California

df['name’] == “Burma”

13 of 16

Series Operations

  • A leading question: What does this operation turn into?

13

df['name’] == “Burma”

id

year

month

day

latitude

longitude

name

magnitude

0

nc72666881

2016

7

27

37.672333

-121.619000

California

1.43

1

us20006i0y

2016

7

27

21.514600

94.572100

Burma

4.90

2

nc72666891

2016

7

27

37.576500

-118.859167

California

0.06

False

True

False

14 of 16

Filtering

  • Can use a bool Series to select which rows from the dataset

  • Can use multiple filters with: & (and), | (or), ~ (not)

14

mask = df['magnitude'] > 5

df[mask]

# Same as: data[data['magnitude'] > 5]

id

year

month

day

latitude

longitude

name

magnitude

30

us20006i18

2016

7

27

-24.286000

-67.864700

Chile

5.60

114

us20006i35

2016

7

27

36.492200

140.756800

Japan

5.30

421

us1000683b

2016

7

28

-16.824200

-172.515800

Tonga

5.10

df[(df['magnitude'] > 5) & ~(df['day'] == 27)]

15 of 16

Location

How to access data in pandas

Series

DataFrame

Options for indexers:

  • A single value
  • A list of values or a slice
  • A mask
  • : to select all values

Remember the end of a slice is inclusive unlike Python’s standard

15

df[<indexer>]

df.loc[<row indexer>, <column indexer>]

series[<indexer>]

16 of 16

Before Next Time

  • Complete Lesson 7
    • Remember not for points, but do go towards Checkpoint Tokens
  • Data Exploration + LR 1 due today!

Next Time

  • More Pandas functions!
  • Specifically, groupby and apply

16