DataTrail - building inclusive data science communities
https://www.datatrail.org/
cansavvy.com/talks
Talent is equally distributed . . .
. . . opportunity is not.
Income mobility
https://www.opportunityatlas.org/
Poverty is pervasive in East Baltimore, Johns Hopkins University neighborhood
Source: https://www.opportunityatlas.org/ via jtleek.com/talks
The median family income in this neighborhood is $18,000 for individuals in their mid-thirties.
Income mobility is limited near FHCRC
https://www.opportunityatlas.org/
Wealthy medical centers near opportunity deserts
Fred Hutchinson (Seattle)
UPMC (Pittsburgh)
UChicago Medicine (Chicago)
Mt. Sinai (New York City)
Moffitt Cancer Center (Tampa Bay)
University of Michigan (Ann Arbor)
Income inequality as a public health problem
Source: http://www.equality-of-opportunity.org/data/
What does this have to do with
education?
Mobility Rate = (Access) x (Top Quintile Success Rate)
Source: https://www.nber.org/papers/w23618
Mobility Rate = (Access) x (Top Quintile Success Rate)
Source: https://www.nber.org/papers/w23618
Fraction of students with parents in bottom quintile of income
Fraction of students at upper quintile of income by age 34
More education, higher income mobility
Source: https://www.brookings.edu/wp-content/uploads/2016/07/02_economic_mobility_sawhill_ch8.pdf
College | Mobility Rate | Access | Success |
Cal State University - LA | 9.9% | 33.1% | 29.9% |
Pace University - New York | 8.4% | 15.2% | 55.6% |
SUNY - Stony Brook | 8.4% | 16.4% | 51.2% |
Technical Career Institutes | 8.0% | 40.3% | 19.8% |
University of Texas - Pan American | 7.6% | 38.7% | 19.8% |
What does this have to do with
scalable data science education?
Source: https://papers.ssrn.com/sol3/papers.cfm?abstract_id=3260695
Source: https://papers.ssrn.com/sol3/papers.cfm?abstract_id=3260695
MOOCs can help increase income!
Ph.D
Other Professional Degrees
Master's Degree
College Degree
Some College
Associate Degree
High School
Less than High School
But, MOOCs aren’t getting to the people who could use them!
MOOCs typically benefit the already-well-educated
24
April 2019
Our Coursera MOOCs and most other data science training programs
Can we build something for everyone else?
25
April 2019
Our Coursera MOOCs and most other data science training programs
The Trail to becoming a Data scientist
Know about
data science
Income security
Expensive
computer
Appropriate programs
Access to
instruction
Right jobs being posted
Access to connections
Know about
data science
Income security
Expensive
computer
Appropriate programs
Access to
instruction
Right jobs being posted
Access to connections
Know about
data science
Income security
Expensive
computer
Appropriate programs
Access to
instruction
Right jobs being posted
Access to connections
Andrea Kozak
Michael Bannon
Alison Bernstein
Casey Greene
Jackie Taroni
Jeff Leek
Who helped me build a career as a data scientist?
Who invested in you to get you where you are today?
Know about
data science
Income security
Expensive
computer
Appropriate programs
Access to
instruction
Right jobs being posted
Access to connections
DataTrail Program Components
Who?
Source: https://hebcac.org/
Community Members we Serve
Our Ideal Candidate
Eligibility Requirements
How?
DataTrail’s ecosystem approach supports the whole person
For-Profit Companies
Colleges & Universities
Non-Profit
Community
Partners
Provide tutors and educational support
Provide social and life support
Provide mentors & opportunities
Job Search Assistance
Free
Laptops
Online Support
Payment
For Course Completions
Connections to internships and mentors
Source: https://simplystatistics.org/2017/08/29/data-science-on-a-chromebook/
Source: http://slides.google.com
Source: http://sheets.google.com
Source: https://rstudio.cloud
Form a question
Get the data
Clean the data
Plot the data
Get stats
Share
results
rmarkdown
---�title: "My awesome website"�output: � html_document:� toc: true� toc_float: true� theme: cerulean�---�# This is Jeff's awesome website��
flexdashboard
---�title: "How does your BMI measure up?"�output: flexdashboard::flex_dashboard�runtime: shiny�---��Inputs {.sidebar}�-------------------------------------��```{r}�library(flexdashboard); library(NHANES); library(plotly);library(dplyr)�sliderInput("height", "Height in inches",0,100,72)�sliderInput("weight", "Weight in pounds",0,500,100)�sliderInput("age", "Age in years",0,120,50)��```� �Column�-------------------------------------� �### Chart 1� �```{r}�nhanes = sample_n(NHANES,100)�renderPlotly({� df = data.frame(bmi = c(nhanes$BMI,input$weight*0.45/(input$height*0.025)^2),� age = c(nhanes$Age,input$age),� who = c(rep("nhanes",100),"you"))� ggplotly(ggplot(df) + � geom_point(aes(x=age,y=bmi,color=who)) +� scale_x_continuous(limits=c(0,90)) + � scale_y_continuous(limits=c(0,60)) +� theme_minimal()� )�})�```�
dbplyr
library(bigrquery)
set_service_token("file.json"))
con <- dbConnect(
bigquery(),
project = "project_name",
dataset = "dataset_name"
)
unique_elements = con %>%
tbl("dataset1") %>%
count()
�
unique_elments�Running job 'job_id.US'...�Complete�Billed: 32.51 MB�Downloading 10 rows in 1 pages.�# Source: lazy query [?? x 2] �# Database: BigQueryConnection
n� <int>�1 3700675
DataTrail on GitHub @datatrail-jhu
https://github.com/datatrail-jhu/DataTrail
Free Resources to Run DataTrail
Free courses
Online Community
Starter Kit
The outcome?
It is a very compelling experience
https://magazine.jhsph.edu/2019/data-science-careers-baltimores-underserved-community-members
How can you get involved?
How can you get involved?
Tell others about DataTrail!
Encourage more inclusive hiring practices
Recommend datasets for learning examples
Host a DataTrail graduate as an intern!
Start your own franchise!
How can you get involved?
Tell others about DataTrail!
Encourage more inclusive hiring practices
Recommend datasets for learning examples
Host a DataTrail graduate as an intern!
Start your own franchise!
Helping companies write inclusive ads
Two Approaches
https://streamlinedatascience.io/
Direct Hire:
Contracting:
How can you get involved?
Tell others about DataTrail!
Encourage more inclusive hiring practices
Recommend datasets for learning examples
Host a DataTrail graduate as an intern!
Start your own franchise!
DataTrail on GitHub @datatrail-jhu
https://github.com/datatrail-jhu/DataTrail
Content is online: https://datatrail-jhu.github.io/DataTrail/
Self-taught version is available for free on Leanpub:
To contribute a data learning example:
https://github.com/datatrail-jhu/DataTrail/issues
How can you get involved?
Tell others about DataTrail!
Encourage more inclusive hiring practices
Recommend datasets for learning examples
Host a DataTrail graduate as an intern!
Start your own franchise!
Internships
What is required for hosting a DataTrail intern?
Email me with your interest csavonen@fredhutch.org
How can you get involved?
Tell others about DataTrail!
Encourage more inclusive hiring practices
Recommend datasets for learning examples
Host a DataTrail graduate as an intern!
Start your own franchise!
To run a franchise
Questions or want to get involved?
Thanks
Jeff Leek and Michael Rosenblum
Students
Simina Boca, Hilary Parker, Andrew Jaffe, Alyssa Frazee, Nick Carchedi, Leo Collado Torres, Leslie Myint, Prasad Patil, Claire Ruberman, Jack Fu, Sara Wang, Kayode Sosina, Sarah McClymont + many visitors/interns/student collaborators!
Postdocs
Abhi Nellore, Kai Kammers, Shannon Ellis, Aboozar Hadavand, Lucy D’Agostino McGowan
Genomics Collaborators
Ben Langmead, Andrew Jaffe, Kasper Hansen, Margaret Taub, all of Hopkins Genomics + many more
JHU DaSL Collaborators
Roger Peng, Brian Caffo, Stephanie Hicks, John Muschelli, Leah Jager
JHU DaSL Staff
Ira Gooding, Jessica Crowell, Sean Kross, Nick Carchedi, Ashley Johnson, Simone Sawyer,
Davon Person, Allissa Dillman, Liz Torres Brown
Hopkins Admin
Karen Bandeen-Roche, Christy Wyskiel, Sukon Kanchanaraksa, Mike Klag, Ellen MacKenzie, Ron Daniels + many others
Hebcac/YO/HeartSmiles
Ed Sabatino, Joni Hollifield
Problem Forward/Streamline
Jamie McGovern, Kenny Morales,
Ju Kim, Will Richardson, Ryland Sumner and more...
Funders
*Views expressed are not those of our funders but my own