1 of 74

DataTrail - building inclusive data science communities

https://www.datatrail.org/

cansavvy.com/talks

2 of 74

Talent is equally distributed . . .

3 of 74

. . . opportunity is not.

  • Leila Janah

4 of 74

Income mobility

5 of 74

https://www.opportunityatlas.org/

6 of 74

Poverty is pervasive in East Baltimore, Johns Hopkins University neighborhood

Source: https://www.opportunityatlas.org/ via jtleek.com/talks

The median family income in this neighborhood is $18,000 for individuals in their mid-thirties.

7 of 74

Income mobility is limited near FHCRC

https://www.opportunityatlas.org/

8 of 74

Wealthy medical centers near opportunity deserts

Fred Hutchinson (Seattle)

UPMC (Pittsburgh)

UChicago Medicine (Chicago)

Mt. Sinai (New York City)

Moffitt Cancer Center (Tampa Bay)

University of Michigan (Ann Arbor)

9 of 74

Income inequality as a public health problem

10 of 74

Source: http://www.equality-of-opportunity.org/data/

11 of 74

12 of 74

What does this have to do with

education?

13 of 74

Mobility Rate = (Access) x (Top Quintile Success Rate)

Source: https://www.nber.org/papers/w23618

14 of 74

Mobility Rate = (Access) x (Top Quintile Success Rate)

Source: https://www.nber.org/papers/w23618

Fraction of students with parents in bottom quintile of income

Fraction of students at upper quintile of income by age 34

15 of 74

16 of 74

17 of 74

18 of 74

More education, higher income mobility

Source: https://www.brookings.edu/wp-content/uploads/2016/07/02_economic_mobility_sawhill_ch8.pdf

19 of 74

College

Mobility Rate

Access

Success

Cal State University - LA

9.9%

33.1%

29.9%

Pace University - New York

8.4%

15.2%

55.6%

SUNY - Stony Brook

8.4%

16.4%

51.2%

Technical Career Institutes

8.0%

40.3%

19.8%

University of Texas - Pan American

7.6%

38.7%

19.8%

20 of 74

What does this have to do with

scalable data science education?

21 of 74

Source: https://papers.ssrn.com/sol3/papers.cfm?abstract_id=3260695

22 of 74

Source: https://papers.ssrn.com/sol3/papers.cfm?abstract_id=3260695

MOOCs can help increase income!

23 of 74

Ph.D

Other Professional Degrees

Master's Degree

College Degree

Some College

Associate Degree

High School

Less than High School

But, MOOCs aren’t getting to the people who could use them!

24 of 74

MOOCs typically benefit the already-well-educated

24

April 2019

Our Coursera MOOCs and most other data science training programs

25 of 74

Can we build something for everyone else?

25

April 2019

Our Coursera MOOCs and most other data science training programs

26 of 74

The Trail to becoming a Data scientist

27 of 74

Know about

data science

Income security

Expensive

computer

Appropriate programs

Access to

instruction

Right jobs being posted

Access to connections

28 of 74

Know about

data science

Income security

Expensive

computer

Appropriate programs

Access to

instruction

Right jobs being posted

Access to connections

29 of 74

Know about

data science

Income security

Expensive

computer

Appropriate programs

Access to

instruction

Right jobs being posted

Access to connections

30 of 74

Andrea Kozak

Michael Bannon

Alison Bernstein

Casey Greene

Jackie Taroni

Jeff Leek

Who helped me build a career as a data scientist?

31 of 74

Who invested in you to get you where you are today?

32 of 74

Know about

data science

Income security

Expensive

computer

Appropriate programs

Access to

instruction

Right jobs being posted

Access to connections

33 of 74

DataTrail Program Components

34 of 74

Who?

35 of 74

Source: https://hebcac.org/

36 of 74

37 of 74

Community Members we Serve

Our Ideal Candidate

Eligibility Requirements

  • Baltimore City resident between the ages of 18-24
  • Registered member of Youth Opportunity (YO!) Baltimore (via HEBCAC)
  • Possesses a high school diploma or GED
  • Interest in computers, programming, coding or related subject matters
  • Ability to complete a 14-week course of study
  • Interested in full-time employment upon completion
  • Able to pass a background check and drug screen
  • Strong communication skills
  • Perseverant, self-motivated learner

38 of 74

How?

39 of 74

DataTrail’s ecosystem approach supports the whole person

For-Profit Companies

Colleges & Universities

Non-Profit

Community

Partners

Provide tutors and educational support

Provide social and life support

Provide mentors & opportunities

40 of 74

Job Search Assistance

Free

Laptops

Online Support

Payment

For Course Completions

Connections to internships and mentors

41 of 74

Source: https://simplystatistics.org/2017/08/29/data-science-on-a-chromebook/

42 of 74

Source: http://slides.google.com

43 of 74

Source: http://sheets.google.com

44 of 74

Source: https://rstudio.cloud

45 of 74

Form a question

Get the data

Clean the data

Plot the data

Get stats

Share

results

46 of 74

rmarkdown

---�title: "My awesome website"�output: � html_document:� toc: true� toc_float: true� theme: cerulean�---�# This is Jeff's awesome website��![](https://media.giphy.com/media/drXGoW1iudhKw/giphy.gif)

47 of 74

flexdashboard

---�title: "How does your BMI measure up?"�output: flexdashboard::flex_dashboard�runtime: shiny�---��Inputs {.sidebar}�-------------------------------------��```{r}�library(flexdashboard); library(NHANES); library(plotly);library(dplyr)�sliderInput("height", "Height in inches",0,100,72)�sliderInput("weight", "Weight in pounds",0,500,100)�sliderInput("age", "Age in years",0,120,50)��```� �Column�-------------------------------------� �### Chart 1� �```{r}�nhanes = sample_n(NHANES,100)�renderPlotly({� df = data.frame(bmi = c(nhanes$BMI,input$weight*0.45/(input$height*0.025)^2),� age = c(nhanes$Age,input$age),� who = c(rep("nhanes",100),"you"))� ggplotly(ggplot(df) + � geom_point(aes(x=age,y=bmi,color=who)) +� scale_x_continuous(limits=c(0,90)) + � scale_y_continuous(limits=c(0,60)) +� theme_minimal()� )�})�```�

48 of 74

dbplyr

library(bigrquery)

set_service_token("file.json"))

con <- dbConnect(

bigquery(),

project = "project_name",

dataset = "dataset_name"

)

unique_elements = con %>%

tbl("dataset1") %>%

count()

unique_elments�Running job 'job_id.US'...�Complete�Billed: 32.51 MB�Downloading 10 rows in 1 pages.�# Source: lazy query [?? x 2] �# Database: BigQueryConnection

n� <int>�1 3700675

49 of 74

DataTrail on GitHub @datatrail-jhu

https://github.com/datatrail-jhu/DataTrail

50 of 74

Free Resources to Run DataTrail

Free courses

Online Community

Starter Kit

51 of 74

The outcome?

52 of 74

It is a very compelling experience

https://magazine.jhsph.edu/2019/data-science-careers-baltimores-underserved-community-members

53 of 74

How can you get involved?

54 of 74

How can you get involved?

Tell others about DataTrail!

Encourage more inclusive hiring practices

Recommend datasets for learning examples

Host a DataTrail graduate as an intern!

Start your own franchise!

55 of 74

Find out more at datatrail.org/

56 of 74

How can you get involved?

Tell others about DataTrail!

Encourage more inclusive hiring practices

Recommend datasets for learning examples

Host a DataTrail graduate as an intern!

Start your own franchise!

57 of 74

58 of 74

59 of 74

Helping companies write inclusive ads

60 of 74

Two Approaches

  • JHSPH directly hires graduate
  • DataTrail provides HR support
  • DataTrail provides ongoing mentoring
  • DataTrail provides technical support

  • Streamline PBC hires graduates
  • Streamline contracts with companies
  • DataTrail provides ongoing mentoring
  • DataTrail provides technical support

https://streamlinedatascience.io/

Direct Hire:

Contracting:

61 of 74

How can you get involved?

Tell others about DataTrail!

Encourage more inclusive hiring practices

Recommend datasets for learning examples

Host a DataTrail graduate as an intern!

Start your own franchise!

62 of 74

DataTrail on GitHub @datatrail-jhu

https://github.com/datatrail-jhu/DataTrail

63 of 74

64 of 74

Self-taught version is available for free on Leanpub:

https://leanpub.com/c/datatrail

65 of 74

  • R Markdown based
  • Data should not require too much background knowledge
  • Data needs to be publicly available - or okay to be made so

66 of 74

To contribute a data learning example:

  1. Email me with your idea at csavonen@fredhutch.org

  • Or drop us an issue with your idea on GitHub

https://github.com/datatrail-jhu/DataTrail/issues

67 of 74

How can you get involved?

Tell others about DataTrail!

Encourage more inclusive hiring practices

Recommend datasets for learning examples

Host a DataTrail graduate as an intern!

Start your own franchise!

68 of 74

Internships

69 of 74

What is required for hosting a DataTrail intern?

  • Funding is available
  • An entry level data science project that you have in mind
    • We can help determine whether a project is a good fit
  • Time to mentor the intern
    • Meet once or twice a week
    • Postdoc or senior grad student could be the direct mentor
  • Preferably ~8 weeks or more duration

Email me with your interest csavonen@fredhutch.org

70 of 74

How can you get involved?

Tell others about DataTrail!

Encourage more inclusive hiring practices

Recommend datasets for learning examples

Host a DataTrail graduate as an intern!

Start your own franchise!

71 of 74

To run a franchise

  • A non-profit training partner for student identification & wrap around support.
  • Funding for Chromebooks/Stipend $3k/student
  • A team to support students through the program
    • Program leader
    • Typically 10-20% effort from 2-3 tutors
    • ~14 weeks per cohort
  • Initial job hiring partners - ideally 8+ weeks with significant mentorship
    • Your institution
    • Corporate partners

72 of 74

Questions or want to get involved?

Email me at csavonen@fredhutch.org

Find out more at datatrail.org/

73 of 74

Thanks

Jeff Leek and Michael Rosenblum

Students

Simina Boca, Hilary Parker, Andrew Jaffe, Alyssa Frazee, Nick Carchedi, Leo Collado Torres, Leslie Myint, Prasad Patil, Claire Ruberman, Jack Fu, Sara Wang, Kayode Sosina, Sarah McClymont + many visitors/interns/student collaborators!

Postdocs

Abhi Nellore, Kai Kammers, Shannon Ellis, Aboozar Hadavand, Lucy D’Agostino McGowan

Genomics Collaborators

Ben Langmead, Andrew Jaffe, Kasper Hansen, Margaret Taub, all of Hopkins Genomics + many more

JHU DaSL Collaborators

Roger Peng, Brian Caffo, Stephanie Hicks, John Muschelli, Leah Jager

JHU DaSL Staff

Ira Gooding, Jessica Crowell, Sean Kross, Nick Carchedi, Ashley Johnson, Simone Sawyer,

Davon Person, Allissa Dillman, Liz Torres Brown

Hopkins Admin

Karen Bandeen-Roche, Christy Wyskiel, Sukon Kanchanaraksa, Mike Klag, Ellen MacKenzie, Ron Daniels + many others

Hebcac/YO/HeartSmiles

Ed Sabatino, Joni Hollifield

Problem Forward/Streamline

Jamie McGovern, Kenny Morales,

Ju Kim, Will Richardson, Ryland Sumner and more...

74 of 74

Funders

  • John Hopkins University
  • Abell Foundation
  • Bloomberg Philanthropies

*Views expressed are not those of our funders but my own