1 of 9

Pandas Continued; Networked Data

More data

1

2 of 9

Announcements

  1. HW5 (Tic Tac Toe): Due tonight (Thursday). Get started now — it’s tricky!
  2. Project 1 Grades: Will Post on Monday
  3. Need another day to get Project 2 posted. Will send out an announcement when it’s done.

2

3 of 9

Outline

  • Continue Congress activity (from Tuesday)
    1. Filter, sort, group, plot
  • Working with networked data
    • Text files (CSV, TXT, HTML)
    • Binary files (e.g. images and media files)
    • Data APIs
  • Exercises
    • Download and save files
    • Display different kinds of files in Pandas
    • Manipulate and transform files

3

4 of 9

Purpose

Command

Description

Create

df = pd.read_csv(<path...>)

load dataframe from a CSV file

Create

df = pd.DataFrame(some_list)

Create dataframe from a list of dictionaries

Select Columns

df = dataframe[['a', 'b']]

only show columns a and b from the dataframe

Filter

rule = df['gender'] == 'F'�only_females = df[rule]

Filters the data by a rule.

Sort

df.sort_values('last_name', ascending=True)

Sorts values according to column

Group

df = df.groupby(['a', 'b']).count()

performs an aggregate calculation (count, min, max) by columns a and b

Set Index

df.set_index('store_id')

sets the table index (row labels).

Join

df.join(df1)

merges two tables together by matching on the table indexes

Plot

df.plot(kind='barh', figsize=(8, 10), width=0.65)

Plots data according to a particular kind of chart

5 of 9

Your Turn...

  1. Open the “01. Looking at Congress.ipynb” notebook in your web browser
  2. Attempt the exercises, using the previous notebook as a model

6 of 9

Working with networked data

6

7 of 9

Consuming Files / Data over the Internet

  • Files exist everywhere, and can be loaded from disk or from a location on another server
  • Different file types work a little bit differently
  • [DEMO]�lecture_15/notebooks/02. Getting Files from the Internet.ipynb

7

8 of 9

More Practice: Analyzing Divvy Data

[DEMO]�lecture_15/notebooks/03. Divvy Bikes Demo.ipynb

1. Select columns

2. Filter and sort

3. Search by dynamic criteria

4. Parse / feed data into new format

8

9 of 9

More Practice: Spotify Data

[DEMO]�lecture_15/notebooks/04. Spotify Demo.ipynb

9