1 of 64

Machine Learning in Biomedicine

Sean Davis, MD, PhD

Center for Cancer Research, National Cancer Institute,

National Institutes of Health

July 9, 2017

@seandavis12

2 of 64

Gartner Hype Cycle

3 of 64

4 of 64

ML Tech Triggers

  • Lots of Big Data
  • Transition to digital in all aspects of life and science
  • Systems and ecosystems that drive and are driven by data production and consumption
    • Advertising,
    • Social media
    • Finance
    • Health care
    • Energy
    • Transportation

5 of 64

ML Tech Triggers

  • Cheap and easy access to lots of compute resources
  • Build a small-scale prototype system, immediately scale to petabytes or even larger
  • Advances in coprocessors and GPUs
  • Bigger, faster memory and storage

6 of 64

ML Tech Triggers

  • New software
  • Commodity ecosystems for big data and machine learning
  • Workforce increasingly skilled and engaged
  • National and international recognition
    • Cyber
    • USA Chief Data Scientist

7 of 64

“A computer program is said to learn from experience (E) with some class of tasks (T) and a performance measure (P) if its performance at tasks in T as measured by P improves with E”

8 of 64

9 of 64

Applications

10 of 64

Algorithms

11 of 64

Supervised learning

12 of 64

Unsupervised Learning

13 of 64

14 of 64

An early example--gene expression classifier

15 of 64

An early genomics example of supervised ML

16 of 64

17 of 64

Result is a program to

perform predictions!

18 of 64

19 of 64

20 of 64

Naive Bayes

21 of 64

Individual features as predictors

22 of 64

Likelihood ratio

23 of 64

Feature relationships

24 of 64

Exploiting independence

25 of 64

26 of 64

RandomForests

Slides from Anshul Kundaje

27 of 64

Functional Elements in the genome

Ecker et al. 2012

Repressed Gene

Transcription factors

(Regulatory proteins)

Enhancer

Insulator

Promoter

mRNA

Protein

Active Gene

Nucleosomes

Chromatin (histone) modifications

DNA

Motif

Dynamic Regulation of gene expression

28 of 64

Chromatin immunoprecipitation (ChIP-seq)

Protein-DNA binding maps

Maps of histone modifications

Maps of histone variants

29 of 64

Chromatin accessibility (DNase-seq, FAIRE-seq, ATAC-seq) and nucleosome sequencing (MNase-seq)

DNase-seq (ATAC-seq) ~= sum(ChIP-seq for all TFs)

30 of 64

ENCODE functional signal maps

Chromatin modification maps

Transcription factor binding map

H3K4me3

H3K27ac

H3K4me1

H3K36me3

H3K27me3

31 of 64

Relationship between chromatin marks and gene expression

Aggregation analysis and simple univariate correlation analysis suggest strong positive or negative relationships between gene expression and enrichment of chromatin marks at gene promoters

What is the collective predictive power of a set of chromatin marks? Which ones are more predictive?

 

32 of 64

Multivariate predictive model

 

 

Linear Regression model

Minimize square error to find betas

 

Input variables (features)

33 of 64

Regression coefficients, correlation, independence

 

34 of 64

A non-linear model (decision/regression tree)

  • Top-down, greedy procedure
  • Recursively splits data based on a single feature
  • Goal is to determine subsets of data with the same classes (for classification), or similar values (for regression)
  • Predict the majority class / average value for the subset

H3K4me3 > 5

H3K4me1 > 10

H3K27me3 > 10

35 of 64

Multivariate predictive model (Random forest)

    • Ensemble of decision or regression trees learned on bootstrapped samples of training data
    • Average prediction from all trees

36 of 64

Random forests

Utilizes random sampling over

    • Features - only allow splits on a small subset
    • Examples - train trees on subsamples of data

to construct a collection of decision trees (forest) with significantly improved prediction performance

37 of 64

Learning algorithm

  • Use random-forests to generate rules
    • Sample variable depth decision trees (average depth i.e. size of rule ~ 6)
    • 2000 to 5000 potentially redundant rules

  • L1 regularization to learn sparse set of non-redundant rules
    • Optimize squared-error ramp loss

    • L1 penalty for sparsity
    • Tries to make many coefficients 0

RuleFit3: Friedman et al. 2005

TF1

TF2

TF3

rk

38 of 64

Projected co-association scores

Partial Dependence of F(X) on a set (eg. pairs) of TFs ‘g

G = complement of g

Co-association score between pairs of TFs

RuleFit3: Friedman et al. 2005

39 of 64

Deep learning example

Several slides adapted from Anshul Kundaje

40 of 64

Deep learning

https://www.slideshare.net/LuMa921/deep-learning-a-visual-introduction

41 of 64

What does deep learning learn?

https://www.slideshare.net/LuMa921/deep-learning-a-visual-introduction

42 of 64

https://www.slideshare.net/LuMa921/deep-learning-a-visual-introduction

43 of 64

44 of 64

Chromatin architecture cartoon

45 of 64

46 of 64

ATAC-Seq and fragment lengths

47 of 64

Chromatin architecture and chromatin state

48 of 64

Transform a 1D dataset into 2D

49 of 64

Transformed to an image classification problem!

50 of 64

Convolution

51 of 64

Convolution

52 of 64

Convolution learns multiple low-level features

53 of 64

Pooling, or smoothing, to gain robustness

54 of 64

How does learning proceed?

55 of 64

How does learning proceed?

56 of 64

How does learning proceed?

57 of 64

Learn from sequence as well

58 of 64

Kundaje’s Chromputer

59 of 64

Closing thoughts

60 of 64

MANY, MANY more applications

61 of 64

https://www.cancer.gov/research/key-initiatives/moonshot-cancer-initiative/blue-ribbon-panel/blue-ribbon-panel-report-2016.pdf

62 of 64

63 of 64

Internet of Things

Potential to fundamentally change the way we interact with research subjects, patients, and the general population.

64 of 64