Pandas Continued; Networked Data
More data
1
Announcements
2
Outline
3
Purpose | Command | Description |
Create | df = pd.read_csv(<path...>) | load dataframe from a CSV file |
Create | df = pd.DataFrame(some_list) | Create dataframe from a list of dictionaries |
Select Columns | df = dataframe[['a', 'b']] | only show columns a and b from the dataframe |
Filter | rule = df['gender'] == 'F'�only_females = df[rule] | Filters the data by a rule. |
Sort | df.sort_values('last_name', ascending=True) | Sorts values according to column |
Group | df = df.groupby(['a', 'b']).count() | performs an aggregate calculation (count, min, max) by columns a and b |
Set Index | df.set_index('store_id') | sets the table index (row labels). |
Join | df.join(df1) | merges two tables together by matching on the table indexes |
Plot | df.plot(kind='barh', figsize=(8, 10), width=0.65) | Plots data according to a particular kind of chart |
Your Turn...
Working with networked data
6
Consuming Files / Data over the Internet
7
More Practice: Analyzing Divvy Data
[DEMO]�lecture_15/notebooks/03. Divvy Bikes Demo.ipynb
1. Select columns
2. Filter and sort
3. Search by dynamic criteria
4. Parse / feed data into new format
8
More Practice: Spotify Data
[DEMO]�lecture_15/notebooks/04. Spotify Demo.ipynb
9