The data science curriculum
Ariel Rokem
HSI STEM HUB + WBDIH Data Science Training and Collaboration workshop, September 16th 2019
Data science : the fourth paradigm
Jim Gray
The first paradigm: empirical research
Experimental and observational
The second paradigm: mathematical theory
Maxwell’s laws
The third paradigm: computational simulations
The fourth paradigm: data-intensive research
Sloan Digital Sky Survey
Large Hadron Collider
Tremendous impact (HGP fact sheet)
Recent estimate : $3.8B investment that drove $796B in economic impact.
But that was just the start!
<= laptop
<= x1000
<= x1M
<= x1B
Data-driven discovery is everywhere!
Social science
Josh Blumenstock et al. (Berkeley ISchool)
Data science for social good
Summer program
~16 students working on 4 projects for 10 weeks
Together with program lead + data scientist
Biomedical science
Nick Reder (UW Pathology), Adam Glaser, Jon Liu (UW Mech E)
Biomedical science
Even in the humanities!
Adam Anderson, UC Berkeley
Meanwhile, in industry
Web-scale data + new computing paradigms => data science
2009: Halevy, Norvig, Pereira (Google): “The unreasonable effectiveness of data”
~2008: Jeff Hammerbacher and DJ Patil (LinkedIn) invent the term “data science”
The data science venn diagram
Drew Conway (2013)
50 years of data science (Donoho, 2017)
John Tukey
1962
1970
Tukey, 1962
Sounds good! But is it?
The data science curriculum
Statistics and machine learning�
Computing
Data visualization and data explanation
The human aspect of data science
Statistical learning and data-driven discovery
Data management
Open source software for science
Tools for teaching and learning
Understanding and explaining data
Understanding and explaining data
Explainable machine learning
Olah et al, 2018
The human side of data science
How?
Standard curriculum (coming up next!) : courses, degrees, ...
Nimble curriculum:
NeuroHackademy
Maybe data science is not a discipline?
Questions?