1 of 42

Materials Informatics 101

A programmatic approach to science

Zachary del Rosario (He/Him)

1

2 of 42

A Programmatic Approach

Programmatic (a): Done using computer code, rather than by hand, especially to support reproducible science

This workshop is about using informatics tools for programmatic materials science

2

3 of 42

Why?

What’s at stake?

What opportunities?

3

4 of 42

Why? and Hello!

What’s at stake?

What opportunities?

Hello!

  • Zachary del Rosario (he/him)
  • Faculty at Olin College
  • “I help scientists and engineers reason under uncertainty.”

4

5 of 42

Why: What’s at Stake?

Reproducibility, Credibility, Scientific Progress, etc.

5

6 of 42

REPRODUCIBILITY

CRISIS

6

7 of 42

Crisis in Empirical, Inferential Work

7

8 of 42

Crisis in Empirical, Inferential Work

In psychology, medicine, but surely not in serious physical sciences….

8

9 of 42

Stodden et al. (2014) PNAS

[Text goes here]

9

10 of 42

Stodden et al. (2014) PNAS

[Text goes here]

The journal Science!

10

11 of 42

Why it Matters

Openness is crucial to science

  • E.g. “What killed alchemy?”, A. Gelman

Honest reproducibility is nontrivial

  • Programmatic tools help!

11

12 of 42

Why: What Opportunities?

From cat detectors

to serious science

12

13 of 42

Only ~$20!

13

14 of 42

More Seriously….

Big Tech is using algorithms to spread misinformation, mine your attention, and destroy democracy….

Can we do anything useful with the same techniques?

del Rosario “Why STEM Students Need to Learn Design Refusal” (2021) Liberal Education

14

15 of 42

Yes We Can!

Data Extraction -> Automate tedious tasks

Data Mining -> Find and understand patterns

Prediction -> Fill gaps in our knowledge

15

16 of 42

Yes, Scientists Are Already Doing This!

HT-DFT +

Data Science

16

17 of 42

What We’ll Cover

  • Data Extraction
  • Data Management
  • Data Visualization
  • Basics of Machine Learning

17

18 of 42

What We’ll Cover

  • Data Extraction
  • Data Management
  • Data Visualization
  • Basics of Machine Learning

There is so much attention on this topic...

18

19 of 42

What We’ll Cover

  • Data Extraction
  • Data Management
  • Data Visualization
  • Basics of Machine Learning

Data Science is so much more than just Machine Learning!

19

20 of 42

What We’ll Cover

  • Data Extraction
  • Data Management
  • Data Visualization
  • Basics of Machine Learning

Quick demo!

20

21 of 42

Today’s Exercises

21

22 of 42

A True Workshop

This is a workshop

… so you’re going to do some hands-on work!

22

23 of 42

Workshop Schedule

Thursday

Tabula +

WebPlotDigitizer

Tidy Data

Python + Jupyter

Fin

Take-Home

Visual Hierarchy

Visualizing Data

Machine Learning

Block 1

Block 2

Block 3

Wrangling Data

Friday

23

24 of 42

Quick Orientation!

  • Open your browser:

bit.ly/wellesley101

24

25 of 42

Quick Orientation

Live page for in-workshop activities

Please open this now

25

26 of 42

Exercise Time!

Let’s get to work on Data Extraction and Management

26

27 of 42

Pause for Survey

  • Did you find the Survey link at the end of the exercise?
    • Take a few seconds to do that survey….

27

28 of 42

Workshop Schedule

Thursday

Tabula +

WebPlotDigitizer

Tidy Data

Python + Jupyter

Fin

Take-Home

Visual Hierarchy

Visualizing Data

Machine Learning

Block 1

Block 2

Block 3

Wrangling Data

Friday

28

29 of 42

Software Setup

  • If you ran the check_install.ipynb notebook, you are good to go!
  • If you’re having trouble, please ask for help

29

30 of 42

01_python_assignment

Let’s get to work on Intro to Python and Jupyter Notebooks

30

31 of 42

Pause for Survey

  • Did you find the Survey link at the end of the exercise?
    • Take a few seconds to do that survey….

31

32 of 42

Workshop Schedule

Thursday

Tabula +

WebPlotDigitizer

Tidy Data

Python + Jupyter

Fin

Take-Home

Visual Hierarchy

Visualizing Data

Machine Learning

Block 1

Block 2

Block 3

Wrangling Data

Friday

32

33 of 42

An Example: We extracted these data...

That’s weird… there are two “blocks”

33

34 of 42

How Should We Handle These?

  1. Option: Manually edit the spreadsheets, take notes somewhere
    1. What if we make a mistake?
    2. What if data get separated from the notes?
  2. Option: ...

34

35 of 42

How Should We Handle These?

  • Option: Manually edit the spreadsheets, take notes somewhere
    • What if we make a mistake?
    • What if data get separated from the notes?
  • Option: Use a Jupyter Notebook to record the data processing

35

36 of 42

02_tidy_assignment

Let’s get to work on Intro to Data Wrangling and Tidy Data

36

37 of 42

Pause for Survey

  • Did you find the Survey link at the end of the exercise?
    • Take a few seconds to do that survey….

37

38 of 42

Looking Ahead

What’s going on tomorrow?

38

39 of 42

Workshop Schedule

Thursday

Tabula +

WebPlotDigitizer

Tidy Data

Python + Jupyter

Fin

Take-Home

Visual Hierarchy

Visualizing Data

Machine Learning

Block 1

Block 2

Block 3

Wrangling Data

Friday

Totally Optional!

39

40 of 42

Workshop Schedule

Thursday

Tabula +

WebPlotDigitizer

Tidy Data

Python + Jupyter

Fin

Take-Home

Visual Hierarchy

Visualizing Data

Machine Learning

Block 1

Block 2

Block 3

Wrangling Data

Friday

40

41 of 42

Workshop Schedule

Thursday

Tabula +

WebPlotDigitizer

Tidy Data

Python + Jupyter

Fin

Take-Home

Visual Hierarchy

Visualizing Data

Machine Learning

Block 1

Block 2

Block 3

Wrangling Data

Friday

41

42 of 42

Workshop Schedule

Thursday

Tabula +

WebPlotDigitizer

Tidy Data

Python + Jupyter

Fin

Take-Home

Visual Hierarchy

Visualizing Data

Machine Learning

Block 1

Block 2

Block 3

Wrangling Data

Friday

42