Materials Informatics
Case Studies
Zachary del Rosario (He/Him)
1
Workshop Schedule
Extract
Wrangle + Tidy
Friday
Saturday
Visualize
Model
Sunday
Monday
Tabula +
WebPlotDigitizer
Python + Jupyter
Concepts
Execution
Concepts
Execution
Concepts
Fin
Focus
Live
Take-Home
2
Survey Time!
3
Case Studies
4
Full Disclosure
Citrine Informatics
Olin College
+
I work closely with the Citrine folks,
affecting my selection of examples!
MPEA Database Checking
The power of simple statistics
6
MPEA Database
Multiple-Principle Element Alloys (MPEA)
Borg, C.K.H. et al. Sci Data (2020).
DOI: 10.1038/s41597-020-00768-9
7
MPEA Database
Multiple-Principle Element Alloys (MPEA)
Borg, C.K.H. et al. Sci Data (2020).
DOI: 10.1038/s41597-020-00768-9
8
MPEA a “Data Literature” Review
Borg, C.K.H. et al. Sci Data (2020).
DOI: 10.1038/s41597-020-00768-9
9
MPEA Finding Errors!
Borg, C.K.H. et al. Sci Data (2020).
DOI: 10.1038/s41597-020-00768-9
10
MPEA Finding Errors!
Borg, C.K.H. et al. Sci Data (2020).
DOI: 10.1038/s41597-020-00768-9
Plotly and boxplots!
11
MPEA Takeaways
12
Starrydata2
WebPlotDigitizer for data extraction
13
Starrydata2 Concept
Thermoelectric Datasets
Katsura et al. (2019) Science and Technology of Advanced Materials
DOI: 10.1080/14686996.2019.1603885
14
Starrydata2 Concept
Thermoelectric Datasets
Katsura et al. (2019) Science and Technology of Advanced Materials
DOI: 10.1080/14686996.2019.1603885
15
Starrydata2 Schematic
Katsura et al. (2019) Science and Technology of Advanced Materials
DOI: 10.1080/14686996.2019.1603885
16
Starrydata2 Schematic
Manual, streamlined data extraction
WebPlotDigitizer!
Katsura et al. (2019) Science and Technology of Advanced Materials
DOI: 10.1080/14686996.2019.1603885
17
Starrydata2 Findings
Compared doping model to experimental data
Katsura et al. (2019) Science and Technology of Advanced Materials
DOI: 10.1080/14686996.2019.1603885
18
Starrydata2 Takeaways
19
DFT Database Comparison
Data pipeline work and statistics(!)
20
High-Throughput DFT Database Comparison
21
High-Throughput DFT Database Comparison
?
22
DFT-DB Workflow
Hegde et al. (2020) ArXiv preprint arXiv:2007.01988 (Under review)
23
DFT-DB Bug Story!
Hegde et al. (2020) ArXiv preprint arXiv:2007.01988 (Under review)
24
DFT-DB Percent Disagreement
Hegde et al. (2020) ArXiv preprint arXiv:2007.01988 (Under review)
25
DFT-DB Takeaways
Hegde et al. (2020) ArXiv preprint arXiv:2007.01988 (Under review)
26
Heusler Prediction
Machine Learning + Materials Intuition
27
Heusler Prediction
Class of intermetallic compound
Attractive thermoelectric and spintronic candidates
Heusler structure unstable for some compounds!
Oliynyk et al. (2016) Chem. Mater.
28
Heusler Prediction - Methods
Oliynyk et al. (2016) Chem. Mater.
29
Heusler Prediction - Methods
Oliynyk et al. (2016) Chem. Mater.
30
Heusler Prediction - Model
Machine Learning Features (22 total)
Oliynyk et al. (2016) Chem. Mater.
31
Heusler Prediction - Model
Machine Learning Features (22 total)
Oliynyk et al. (2016) Chem. Mater.
I could not have come up with these features!
Materials intuition important for feature engineering
32
Heusler Prediction - Model
Machine Learning Features (22 total)
Hard to hold 22 facts in your head!
Oliynyk et al. (2016) Chem. Mater.
33
Heusler Prediction - Model
Model: Decision Tree
Oliynyk et al. (2016) Chem. Mater.
34
Heusler Prediction - Model
Model: Decision Tree
But decision trees tend to overfit!
Oliynyk et al. (2016) Chem. Mater.
35
Heusler Prediction - Model
Idea: Random Forest
Oliynyk et al. (2016) Chem. Mater.
36
Heusler Prediction
ML model used to predict Heusler formation
Specific candidates selected to be unlikely to form based on valence electron count
Formation tested by experiment
Oliynyk et al. (2016) Chem. Mater.
37
Heusler Prediction Takeaways
Oliynyk et al. (2016) Chem. Mater.
38
Closing Takeaways
39
What We Covered
40
What We Covered
“Locked up” data do us no good!
Tabula + WebPlotDigitizer help us liberate data
41
What We Covered
Data are messy, untidy, and sometimes in error!
We can wrangle and tidy in a reproducible notebook
42
What We Covered
Tables are useless for finding patterns
Informative visuals are easy with plotnine / plotly
43
What We Covered
Simple heuristics can miss trends
Machine Learning can utilize our materials intuition (features)
44
What We Covered
I hope you enjoyed the workshop!
45
Recordings and Slides
All recordings and slides will be posted to the MI 101 Workshop Website
46
Recommended Courses (from AJ)
47
Shhh, A Secret...
All of the solutions to all of the notebooks are on GitHub
These slides linked from `Slides ` page!
48
One Last Survey!
49
While You’re Still Here….
I want this workshop to be as useful as possible
Please fill this out:
https://forms.gle/D1tDSTubxLnHxVfP8
(I will paste link in chat)
50
Questions? Thoughts? Comments?
51