1 of 29

Working With Data

Advantages of a programmatic approach

Zachary del Rosario (He/Him)

1

2 of 29

Workshop Schedule

Extract

Wrangle + Tidy

Friday

Saturday

Visualize

Model

Sunday

Monday

Tabula +

WebPlotDigitizer

Python + Jupyter

Concepts

Execution

Concepts

Execution

Concepts

Fin

Focus

Live

Take-Home

2

3 of 29

An Example: We extracted these data...

That’s weird… there are two “blocks”

3

4 of 29

How Should We Handle These?

  1. Option: Manually edit the spreadsheets, take notes somewhere
    1. What if we make a mistake?
    2. What if data get separated from the notes?
  2. Option: ...

4

5 of 29

How Should We Handle These?

  • Option: Manually edit the spreadsheets, take notes somewhere
    • What if we make a mistake?
    • What if data get separated from the notes?
  • Option: Use a Jupyter Notebook to record the data processing

5

6 of 29

Power of a Programmatic Approach

I’m going to give you some ideas on how a programmatic approach to data can help!

6

7 of 29

Programmatic Data Management

Beyond spreadsheets

7

8 of 29

Comparison: Two Workflows

Spreadsheet

  1. Record data in spreadsheet
  2. Correct errors directly in sheet
  3. Plot in spreadsheet

  • Easy! Point and click
  • Tracking errors?
  • Plots are not reusable!

Programmatic

  1. Record data in spreadsheet
  2. Correct errors in notebook
  3. Plot in notebook

  • Raw data unedited
  • Errors documented
  • Plots reusable
  • Have to code!

8

9 of 29

Comparison: Two Workflows

Spreadsheet

  • Record data in spreadsheet
  • Correct errors directly in sheet
  • Plot in spreadsheet

  • Easy! Point and click
  • Tracking errors?
  • Plots are not reusable!

Programmatic

  • Record data in spreadsheet
  • Correct errors in notebook
  • Plot in notebook

  • Raw data unedited
  • Errors documented
  • Plots reusable
  • Have to code!

9

10 of 29

Comparison: Two Workflows

Spreadsheet

  • Record data in spreadsheet
  • Correct errors directly in sheet
  • Plot in spreadsheet

  • Easy! Point and click
  • Tracking errors?
  • Plots are not reusable!

Programmatic

  • Record data in spreadsheet
  • Correct errors in notebook
  • Plot in notebook

  • Raw data unedited
  • Errors documented
  • Plots reusable
  • Have to code!

10

11 of 29

What Does a Programmatic Approach Look Like?

11

12 of 29

What Does a Programmatic Approach Look Like?

Raw data subdirectory

12

13 of 29

What Does a Programmatic Approach Look Like?

Cache figures, for publications!

13

14 of 29

What Does a Programmatic Approach Look Like?

Processed data

subdirectory

Raw data subdirectory

14

15 of 29

What Does a Programmatic Approach Look Like?

Filenames imply execution order

15

16 of 29

What Does a Programmatic Approach Look Like?

Common utils

keeps notebooks

clean

16

17 of 29

What Does a Programmatic Approach Look Like?

Load raw data

into notebook

17

18 of 29

What Does a Programmatic Approach Look Like?

Load raw data

into notebook

Process raw into working data

18

19 of 29

Note Well….

  • A programmatic approach isn’t about fancy algorithms
  • It’s about discipline and clarity

  • Future you will thank past you!

19

20 of 29

Back to the Weibull Data

I wouldn’t edit the spreadsheet!

I would write a processing notebook

(you will!)

20

21 of 29

Tidy Data

21

22 of 29

An Example: We extracted these data...

That’s weird… there are two “blocks”

How do we handle this?

22

23 of 29

23

24 of 29

An Example: We extracted these data...

That’s weird… there are two “blocks”

Rows have two observations each (not one)

24

25 of 29

A Magnificent Function!

Q: How do we fix this “two observation” problem?

A: With pivoting functions

pivot_longer - Tidyverse (R)

tf_pivot_longer - Grama (Python)

In today’s Live Exercise!

25

26 of 29

Today’s Exercises

26

27 of 29

Exercise and Notebook

  • Live: Wrangling and Tidying
    • Introduction to the main ideas

  • Take-Home: Programmatic Data Management
    • Hands-on with Python coding

27

28 of 29

If Your Install Isn’t Working...

I’ve prepared a Google Colab option:

https://github.com/zdelrosario/mi101-colab

Will paste this in chat

NB. Also linked from MI101 Workshop site

28

29 of 29

Today’s Exercise

  • Live: Wrangling and Tidying (~12:30 -- 1:30pm)
    • Working with data in code
    • Tidy Data
    • Solving data mysteries

  • Standby for breakout rooms...

29