1 of 64

Machine Learning in Biomedicine

Sean Davis, MD, PhD

Center for Cancer Research, National Cancer Institute,

National Institutes of Health

July 9, 2017

@seandavis12

2 of 64

Gartner Hype Cycle

3 of 64

http://www.gartner.com/newsroom/id/3412017

4 of 64

ML Tech Triggers

Lots of Big Data
Transition to digital in all aspects of life and science
Systems and ecosystems that drive and are driven by data production and consumption

Advertising,
Social media
Finance
Health care
Energy
Transportation

5 of 64

ML Tech Triggers

Cheap and easy access to lots of compute resources
Build a small-scale prototype system, immediately scale to petabytes or even larger
Advances in coprocessors and GPUs
Bigger, faster memory and storage

6 of 64

ML Tech Triggers

New software
Commodity ecosystems for big data and machine learning
Workforce increasingly skilled and engaged
National and international recognition

Cyber
USA Chief Data Scientist

7 of 64

“A computer program is said to learn from experience (E) with some class of tasks (T) and a performance measure (P) if its performance at tasks in T as measured by P improves with E”

http://machinelearningmastery.com/what-is-machine-learning/

8 of 64

9 of 64

Applications

10 of 64

Algorithms

11 of 64

Supervised learning

https://blog.recast.ai/machine-learning-algorithms/

12 of 64

Unsupervised Learning

https://blog.recast.ai/machine-learning-algorithms/

13 of 64

http://scikit-learn.org/stable/tutorial/machine_learning_map/

14 of 64

An early example--gene expression classifier

15 of 64

An early genomics example of supervised ML

16 of 64

17 of 64

Result is a program to

perform predictions!

18 of 64

19 of 64

20 of 64

Naive Bayes

21 of 64

Individual features as predictors

22 of 64

Likelihood ratio

23 of 64

Feature relationships

24 of 64

Exploiting independence

25 of 64

26 of 64

RandomForests

Slides from Anshul Kundaje

27 of 64

Functional Elements in the genome

Ecker et al. 2012

Repressed Gene

Transcription factors

(Regulatory proteins)

Enhancer

Insulator

Promoter

mRNA

Protein

Active Gene

Nucleosomes

Chromatin (histone) modifications

DNA

Motif

Dynamic Regulation of gene expression

28 of 64

Chromatin immunoprecipitation (ChIP-seq)

Protein-DNA binding maps

Maps of histone modifications

Maps of histone variants

29 of 64

Chromatin accessibility (DNase-seq, FAIRE-seq, ATAC-seq) and nucleosome sequencing (MNase-seq)

DNase-seq (ATAC-seq) ~= sum(ChIP-seq for all TFs)

30 of 64

ENCODE functional signal maps

Chromatin modification maps

Transcription factor binding map

H3K4me3

H3K27ac

H3K4me1

H3K36me3

H3K27me3

31 of 64

Relationship between chromatin marks and gene expression

Aggregation analysis and simple univariate correlation analysis suggest strong positive or negative relationships between gene expression and enrichment of chromatin marks at gene promoters

What is the collective predictive power of a set of chromatin marks? Which ones are more predictive?

32 of 64

Multivariate predictive model

Linear Regression model

Minimize square error to find betas

Input variables (features)

the interpretation of β_j is the expected change in y for a one-unit change in x_j when the other covariates are held fixed—that is, the expected value of the partial derivative of y with respect to x_j. This is sometimes called the unique effect of x_j ony.

In contrast, the marginal effect of x_j on y can be assessed using a correlation coefficient orsimple linear regression model relating x_j to y; this effect is the total derivative of y with respect to x_j.

It is possible that the unique effect can be nearly zero even when the marginal effect is large. This may imply that some other covariate captures all the information in x_j, so that once that variable is in the model, there is no contribution of x_j to the variation in y.

Conversely, the unique effect of x_j can be large while its marginal effect is nearly zero. This would happen if the other covariates explained a great deal of the variation of y, but they mainly explain variation in a way that is complementary to what is captured by x_j. In this case, including the other variables in the model reduces the part of the variability of y that is unrelated to x_j, thereby strengthening the apparent relationship with x_j.

33 of 64

Regression coefficients, correlation, independence

34 of 64

A non-linear model (decision/regression tree)

Top-down, greedy procedure
Recursively splits data based on a single feature
Goal is to determine subsets of data with the same classes (for classification), or similar values (for regression)
Predict the majority class / average value for the subset

H3K4me3 > 5

H3K4me1 > 10

H3K27me3 > 10

35 of 64

Multivariate predictive model (Random forest)

Ensemble of decision or regression trees learned on bootstrapped samples of training data
Average prediction from all trees

36 of 64

Random forests

Utilizes random sampling over

Features - only allow splits on a small subset
Examples - train trees on subsamples of data

to construct a collection of decision trees (forest) with significantly improved prediction performance

37 of 64

Learning algorithm

Use random-forests to generate rules

Sample variable depth decision trees (average depth i.e. size of rule ~ 6)
2000 to 5000 potentially redundant rules

L1 regularization to learn sparse set of non-redundant rules

Optimize squared-error ramp loss

L1 penalty for sparsity
Tries to make many coefficients 0

RuleFit3: Friedman et al. 2005

TF₁

TF₂

TF₃

r_k

38 of 64

Projected co-association scores

Partial Dependence of F(X) on a set (eg. pairs) of TFs ‘g’

G = complement of g

Co-association score between pairs of TFs

RuleFit3: Friedman et al. 2005

39 of 64

Deep learning example

Several slides adapted from Anshul Kundaje

40 of 64

Deep learning

https://www.slideshare.net/LuMa921/deep-learning-a-visual-introduction

41 of 64

What does deep learning learn?

https://www.slideshare.net/LuMa921/deep-learning-a-visual-introduction

42 of 64

https://www.slideshare.net/LuMa921/deep-learning-a-visual-introduction

43 of 64

44 of 64

Chromatin architecture cartoon

45 of 64

46 of 64

ATAC-Seq and fragment lengths

47 of 64

Chromatin architecture and chromatin state

48 of 64

Transform a 1D dataset into 2D

49 of 64

Transformed to an image classification problem!

50 of 64

Convolution

51 of 64

Convolution

52 of 64

Convolution learns multiple low-level features

53 of 64

Pooling, or smoothing, to gain robustness

54 of 64

How does learning proceed?

55 of 64

How does learning proceed?

56 of 64

How does learning proceed?

57 of 64

Learn from sequence as well

58 of 64

Kundaje’s Chromputer

59 of 64

Closing thoughts

60 of 64

MANY, MANY more applications

61 of 64

https://www.cancer.gov/research/key-initiatives/moonshot-cancer-initiative/blue-ribbon-panel/blue-ribbon-panel-report-2016.pdf

62 of 64

63 of 64

Internet of Things

Potential to fundamentally change the way we interact with research subjects, patients, and the general population.

1 of 64

2 of 64

3 of 64

4 of 64

5 of 64

6 of 64

7 of 64

8 of 64

9 of 64

10 of 64

11 of 64

12 of 64

13 of 64

14 of 64

15 of 64

16 of 64

17 of 64

18 of 64

19 of 64

20 of 64

21 of 64

22 of 64

23 of 64

24 of 64

25 of 64

26 of 64

27 of 64

28 of 64

29 of 64

30 of 64

31 of 64

32 of 64

33 of 64

34 of 64

35 of 64

36 of 64

37 of 64

38 of 64

39 of 64

40 of 64

41 of 64

42 of 64

43 of 64

44 of 64

45 of 64

46 of 64

47 of 64

48 of 64

49 of 64

50 of 64

51 of 64

52 of 64

53 of 64

54 of 64

55 of 64

56 of 64

57 of 64

58 of 64

59 of 64

60 of 64

61 of 64

62 of 64

63 of 64

64 of 64