1
Please go to the following website to vote!
2
Pick your favorite buzzword:
Data Science, Big Data, Machine Learning, Data Science , Big Data, Machine Learning
Sahir Bhatnagar
@syfi_24
sahirbhatnagar.com
EBOH Research Day Student Keynote Presentation
March 16, 2018
http://etc.ch/Zf8v
4
5
https://hbr.org/2012/10/data-scientist-the-sexiest-job-of-the-21st-century
Best Job in America Rankings - #1
6
https://www.glassdoor.com/List/Best-Jobs-in-America-LST_KQ0,20.htm
2016, 2017, 2018
7
https://www.simplilearn.com/big-data-and-analytics/senior-data-scientist-masters-program-training
8
https://www.simplilearn.com/big-data-and-analytics/senior-data-scientist-masters-program-training
9
http://midas.umich.edu/dsi/
What is a Data Scientist?
10
11
http://www.datascienceassn.org/code-of-conduct.html
Data Scientist (n.)
A professional who uses scientific methods to liberate and create meaning from raw data.
12
13
http://www.oralytics.com/2012/06/data-science-is-multidisciplinary.html
14
15
http://joelgrus.com/2013/06/09/post-prism-data-science-venn-diagram/
And it just keeps getting more ridiculous
16
17
Two different searches
18
Natural Language Processing
“Control” Dataset: Epidemiology, Epidemiologist
19
Data Science, Data Scientist
(Bio)Statistics, (Bio)Statistician
20
21
Marie Davidian - North Carolina State University (2013)
22
Bin Yu - University of California at Berkeley (2014)
�
Data science is statistics
����
23
Karl Broman - University of Wisconsin–Madison (2013)
If you’re analyzing data, you’re doing statistics. You can call it data science or informatics or analytics or whatever, but it’s still statistics.
When physicists do mathematics, they don’t say they’re doing “number science”. They’re doing math.
How did we get here?
24
25
Donoho’s 6 Divisions of Data Science
1. Data Gathering, Preparation, and Exploration�2. Data Representation and Transformation�3. Computing with Data�4. Data Modeling (Breiman’s 2 Cultures)�5. Data Visualization and Presentation�6. Science about Data Science�
26
Donoho D. 50 Years of Data Science. Journal of Computational and Graphical Statistics. 2017 Oct 2;26(4):745-66.
The Focus of Traditional Statistics has been on:
1. Data Gathering, Preparation, and Exploration�2. Data Representation and Transformation�3. Computing with Data�4. Data Modeling�5. Data Visualization and Presentation�6. Science about Data Science�
27
Donoho D. 50 Years of Data Science. Journal of Computational and Graphical Statistics. 2017 Oct 2;26(4):745-66.
28
Hadley Wickham - RStudio (2015)
The fact that data science exists as a field is a colossal failure of statistics.
Data munging and manipulation is hard and statistics has just said that’s not our domain.�
To me, that is what statistics is all about. It is gaining insight from data using modelling and visualization.
29
How Many Papers Provide Code?
30
14,614 Abstracts Scraped
31
1,312 (9%) of Abstracts with a match
32
Distribution of the 1,312 matches
33
2. Outdated Course Material
34
First Day of a Data Science Course
35
First Day in a Statistics Course
36
37
Please go to the following website to vote!
38
3. Marketing and Perception
39
A Statistician's Perspective
40
Statistician
Data Scientist
A Data Scientist’s Perspective
41
Statistician
Data Scientist
Everyone Else’s Perspective
42
Statistician
Data Scientist
43
(Bio)Statistician / Epidemiologist
Objective: Mortality risk score to screen for palliative vs. curative care
Machine Learner
This example was inspired by: http://www.fharrell.com/post/medml/#fn:2
Mesmerized by Machine Learning?
44
How to become a Data Scientist in Canada?
45
46
16
Length of program (months)
12
14
10
Cost for Domestic Students (CAD in thousands)
50
40
30
20
10
Data Science Related Masters in Canada
UBC
Master of Data Science (32k)
Queen’s Master of Management Analytics (45k)
SFU Professional Master’s in Big Data (30k)
UofT
MSc in Applied Computing - Concentration in Data Science (22k)
Saint Mary’s
MSc in Computing & Data Analytics (17k)
Ryerson
MSc in Data Science and Analytics (11k)
Trent
MSc in Big Data Analytics (10k)
Waterloo
MSc in Statistics - Data Science Specialisation (9k)
Western
Master of Data Analytics (30k)
https://www.mcgill.ca/datascience/
47
48
What is being taught in these Masters programs?
49
50
https://masterdatascience.science.ubc.ca/
51
https://masterdatascience.science.ubc.ca/program/courses
What is being taught in Bootcamps?
52
Galvanize Data Science: 13 weeks, $16k USD
53
Metis Data Science Bootcamp: 12 weeks, $16k USD
54
NYC Data Science Academy: 12 weeks, $18k USD
55
What to do going forward?
56
57
Larry Wasserman - Carnegie Mellon (https://normaldeviate.wordpress.com/2013/04/13/data-science-the-end-of-statistics/)
Data Science: The End of Statistics?
2. We need to make sure our students are competitive ... serious computing, data structures, distributed computing and multiple programming languages�
58
2. The importance of programming needs to be acknowledged
59
Acknowledgements
60
Code: github.com/sahirbhatnagar/talks
Slides: sahirbhatnagar.com
Im not denying CS people are doing good work. We are too. Stick to your epi/biostats guns
61
62
Big Good Quality Data
63
Dream Selling
64
The activities of GDS are classified into six divisions
1. Data Gathering, Preparation, and Exploration�2. Data Representation and Transformation�3. Computing with Data�4. Data Modeling�5. Data Visualization and Presentation�6. Science about Data Science�
65
Big Data
66
67
68
It’s all about perception
How many of us will end up in academia vs industry? The people in this room can appreciate the value of our degree.. But what about industry? What perception do they have about the value of our degree?
Why is it that they want to hire data scientists?
Here is one thought: (then show the slides about netflix/FB/google vs. Urns)
69