2 of 90

Machine Learning in the Thomson Lab

David Merrell

Thomson Lab meeting 2019-03-29

3 of 90

Motivation

Science is a contest of hypotheses.

4 of 90

Motivation

Science is a contest of hypotheses.

(Taken *without* permission from Li-Fang Chu)

5 of 90

Motivation

Science is a contest of hypotheses.

Machine Learning (ML) can be useful at any point in the lifespan of a hypothesis.

(Taken *without* permission from Li-Fang Chu)

6 of 90

Motivation

Science is a contest of hypotheses.

Machine Learning (ML) can be useful at any point in the lifespan of a hypothesis.

I’ll describe some of the ML work I’ve been doing with Ron Stewart.

(Taken *without* permission from Li-Fang Chu)

7 of 90

The Scientific Method

8 of 90

The Scientific Method

Machine Learning can be injected anywhere in this process!

9 of 90

The Scientific Method

Machine Learning can be injected anywhere in this process!

KinderMiner

Statistical Hypothesis Testing

Data processing

Clustering

PCA

Automation

Statistical

Design of Experiments

Active Learning

EHR mining

10 of 90

The Scientific Method

Machine Learning can be injected anywhere in this process!

Generally speaking, ML can augment any process that involves:

prediction,
finding patterns, or
decision-making.

KinderMiner

Statistical Hypothesis Testing

Data processing

Clustering

PCA

Automation

Statistical

Design of Experiments

Active Learning

EHR mining

11 of 90

Using ML for prediction: Supervised Learning

12 of 90

Using ML for prediction: Supervised Learning

13 of 90

Using ML for prediction: Supervised Learning

Train the predictor:

14 of 90

Using ML for prediction: Supervised Learning

Train the predictor:

15 of 90

Using ML for prediction: Supervised Learning

Test the predictor; measure its performance:

Train the predictor:

16 of 90

Supervised Learning for Drug Repurposing

Aliper et al., 2016

17 of 90

Supervised Learning for Drug Repurposing

Aliper et al., 2016

Use a Neural Network to

predict drugs’ therapeutic uses...

NEURAL NETWORK

18 of 90

Supervised Learning for Drug Repurposing

Aliper et al., 2016

Use a Neural Network to

predict drugs’ therapeutic uses...

19 of 90

Aliper et al., 2016

When the neural network predicts the wrong therapeutic use, maybe that’s actually a drug repurposing opportunity.

Predicted Therapeutic Use

Known Therapeutic Use

20 of 90

(Aside: Neural Networks)

21 of 90

(Aside: Neural Networks)

If you’ve ever fit a line in Excel, then you know something about neural networks!

22 of 90

(Aside: Neural Networks)

A Neural Network is very similar to linear regression or logistic regression.

We’re just fitting a function to data.
(The function happens to be kind of fancy.)

Linear Regression:

23 of 90

(Aside: Neural Networks)

A Neural Network is very similar to linear regression or logistic regression.

We’re just fitting a function to data.
(The function happens to be kind of fancy.)

Linear Regression:

Logistic Regression:

24 of 90

(Aside: Neural Networks)

A Neural Network is very similar to linear regression or logistic regression.

We’re just fitting a function to data.
(The function happens to be kind of fancy.)

Linear Regression:

Logistic Regression:

Neural Network:

25 of 90

(Aside: Neural Networks)

A Neural Network is very similar to linear regression or logistic regression.

We’re just fitting a function to data.
(The function happens to be kind of fancy.)

Linear Regression:

Logistic Regression:

“Deep” Neural Network:

26 of 90

[Aliper et al., 2016] Details

27 of 90

[Aliper et al., 2016] Details: Dataset

Original Dataset: Broad Institute LINCS L1000

1,319,138 profiles

976 + 11,350

=12,797 genes

Drug-perturbed gene expression profiles (microarray)

51,383 perturbagens

76 cell lines

28 of 90

[Aliper et al., 2016] Details: Preprocessing

1,319,138 x 12,797

29 of 90

[Aliper et al., 2016] Details: Preprocessing

1,319,138 x 12,797

Restrict to A549, MCF7, PC3 cell lines; 678 drugs

30 of 90

[Aliper et al., 2016] Details: Preprocessing

26,420 x 976

1,319,138 x 12,797

Restrict to A549, MCF7, PC3 cell lines; 678 drugs

31 of 90

[Aliper et al., 2016] Details: Preprocessing

26,420 x 976

1,319,138 x 12,797

Restrict to A549, MCF7, PC3 cell lines; 678 drugs

OncoFinder:

gene expressions → pathway activations

32 of 90

[Aliper et al., 2016] Details: Preprocessing

26,420 x 976

1,319,138 x 12,797

Restrict to A549, MCF7, PC3 cell lines; 678 drugs

OncoFinder:

gene expressions → pathway activations

Discard “insignificantly perturbed”

profiles (p > 0.05)

33 of 90

[Aliper et al., 2016] Details: Preprocessing

26,420 x 976

1,319,138 x 12,797

Restrict to A549, MCF7, PC3 cell lines; 678 drugs

OncoFinder:

gene expressions → pathway activations

Discard “insignificantly perturbed”

profiles (p > 0.05)

9,352 x 976

(genes)

34 of 90

[Aliper et al., 2016] Details: Preprocessing

26,420 x 976

9,352 x 271

(pathways)

1,319,138 x 12,797

Restrict to A549, MCF7, PC3 cell lines; 678 drugs

OncoFinder:

gene expressions → pathway activations

Discard “insignificantly perturbed”

profiles (p > 0.05)

9,352 x 976

(genes)

35 of 90

[Aliper et al., 2016] Details: Machine Learning

9,352 x 271

(pathways)

9,352 x 976

(genes)

Data:

36 of 90

[Aliper et al., 2016] Details: Machine Learning

9,352 x 271

(pathways)

9,352 x 976

(genes)

Data:

Learning Systems:

“Deep” Neural Networks

37 of 90

[Aliper et al., 2016] Details: Machine Learning

9,352 x 271

(pathways)

9,352 x 976

(genes)

Support Vector Machines

(Baseline)

Data:

Learning Systems:

“Deep” Neural Networks

38 of 90

[Aliper et al., 2016] Details: Machine Learning

9,352 x 271

(pathways)

9,352 x 976

(genes)

Support Vector Machines

(Baseline)

Data:

Learning Systems:

“Deep” Neural Networks

Cross Validation Testing Framework

39 of 90

(Aside: Support Vector Machines)

Very simple idea for a predictor:

Find a line which separates the classes.

40 of 90

(Aside: Support Vector Machines)

Very simple idea for a predictor:

Find a line which separates the classes.

Classify new points by the�side of the line they land on.

41 of 90

(Aside: Support Vector Machines)

Very simple idea for a predictor:

Find a line which separates the classes.

Classify new points by the�side of the line they land on.

42 of 90

(Aside: Support Vector Machines)

Very simple idea for a predictor:

Find a line which separates the classes.

Classify new points by the�side of the line they land on.

43 of 90

(Aside: Support Vector Machines)

Very simple idea for a predictor:

Find a line which separates the classes.

Classify new points by the�side of the line they land on.

There are tricks for making very powerful�classifiers based on this concept.

44 of 90

[Aliper et al., 2016] Details: Results

45 of 90

[Aliper et al., 2016] Details: Results

Drug repurposing opportunities???

Otenzepad:

cardiovascular → nervous system

Pinacidil:

cardiovascular → nervous system

(That’s all they mention in the paper)

46 of 90

[Aliper et al., 2016] Replication

47 of 90

[Aliper et al., 2016] Replication: Preprocessing

26,420 x 976

1,319,138 x 12,797

Restrict to A549, MCF7, PC3 cell lines; 678 drugs

48 of 90

[Aliper et al., 2016] Replication: Preprocessing

26,420 x 976

1,319,138 x 12,797

Restrict to A549, MCF7, PC3 cell lines; 678 drugs

OncoFinder:

gene expressions → pathway activations

49 of 90

[Aliper et al., 2016] Replication: Preprocessing

26,420 x 976

1,319,138 x 12,797

Restrict to A549, MCF7, PC3 cell lines; 678 drugs

OncoFinder:

gene expressions → pathway activations

PROPRIETARY!

50 of 90

[Aliper et al., 2016] Replication: Preprocessing

26,420 x 976

1,319,138 x 12,797

Restrict to A549, MCF7, PC3 cell lines; 678 drugs

OncoFinder:

gene expressions → pathway activations

PROPRIETARY!

51 of 90

[Aliper et al., 2016] Replication: Machine Learning

Data:

26,420 x 976

52 of 90

[Aliper et al., 2016] Replication: Machine Learning

Data:

Learning Systems:

“Deep” Neural Networks

26,420 x 976

53 of 90

[Aliper et al., 2016] Replication: Machine Learning

Support Vector Machines

Data:

Learning Systems:

“Deep” Neural Networks

26,420 x 976

54 of 90

[Aliper et al., 2016] Replication: Machine Learning

Support Vector Machines

Data:

Learning Systems:

“Deep” Neural Networks

26,420 x 976

Naive Bayes

55 of 90

[Aliper et al., 2016] Replication: Machine Learning

Support Vector Machines

Data:

Learning Systems:

“Deep” Neural Networks

26,420 x 976

Naive Bayes

Random Forests

56 of 90

[Aliper et al., 2016] Replication: Machine Learning

Support Vector Machines

Data:

Learning Systems:

“Deep” Neural Networks

(correct) Cross Validation

Testing Framework

26,420 x 976

Naive Bayes

Random Forests

57 of 90

(Aside: Decision Trees & Random Forests)

A very practical

decision tree:

58 of 90

(Aside: Decision Trees & Random Forests)

Decision Trees: Start at the top and answer questions until you reach the bottom.

59 of 90

(Aside: Decision Trees & Random Forests)

Decision Trees: Start at the top and answer questions until you reach the bottom.

There are algorithms to build these trees from labeled data.

60 of 90

(Aside: Decision Trees & Random Forests)

Random Forests: Build many decision trees, but inject some randomness into them. Combine the trees’ decisions via plurality vote.

61 of 90

(Aside: Decision Trees & Random Forests)

Random Forests: Build many decision trees, but inject some randomness into them. Combine the trees’ decisions via plurality vote.

This collection of “cognitively diverse” decision trees can make better decisions than any individual tree!

62 of 90

[Aliper et al., 2016] Replication: Results

63 of 90

[Aliper et al., 2016] Replication: Results

Most drugs were mislabeled -- a full spreadsheet is available on request.

Given the low quality of prediction, it’s hard to say how useful they would be...

64 of 90

[Aliper et al., 2016] Replication: Lessons Learned

How to (not) conduct reproducible research

Conflicts of interest (OncoFinder coefficients)
Code organization (no centralized repository)�

There is a lot of hype around neural networks -- in many cases, a simpler model suffices. → Perform due diligence in model selection.�
The authors made enormous improvements by converting gene expression profiles to signaling pathway activations. This was at least as important as their model choice.

65 of 90

Current & Future Work:

66 of 90

Current & Future Work: Unsupervised Learning

67 of 90

Current & Future Work: Unsupervised Learning

In Supervised Learning we were given a set of labeled data.

Our job was to predict labels for new data.

It was like having a teacher “supervise” the algorithm -- letting it know whether it’s making correct predictions.�

68 of 90

Current & Future Work: Unsupervised Learning

In Supervised Learning we were given a set of labeled data.

Our job was to predict labels for new data.

It was like having a teacher “supervise” the algorithm -- letting it know whether it’s making correct predictions.�

In Unsupervised Learning, the data has no labels.

Our job is to find patterns, regularities, or structure in the data.

There’s no “teacher” giving feedback to the algorithm -- the algorithm doesn’t really make predictions, because it doesn’t even know what it should predict.

69 of 90

Unsupervised Learning: Finding Patterns in Data

70 of 90

Unsupervised Learning: Finding Patterns in Data

71 of 90

Unsupervised Learning: Finding Patterns in Data

72 of 90

Unsupervised Learning: Finding Patterns in Data

Classic unsupervised learning tasks:

Clustering (e.g., hierarchical or k-means)
Dimension Reduction (e.g., Principal Components Analysis)

These tasks (and many others) can be formulated using Bayesian Statistics.

73 of 90

Exciting New Bayesian Tools!

Math & Algorithms

Black Box Variational Inference
Hamiltonian MCMC
Major developments within�the past 10 years

74 of 90

Exciting New Bayesian Tools!

Math & Algorithms

Black Box Variational Inference
Gradient-based MCMC
Major developments within�the past 10 years

75 of 90

Exciting New Bayesian Tools!

Math & Algorithms

Black Box Variational Inference
Gradient-based MCMC
Major developments within�the past 10 years�

Technologies & Software: Probabilistic Programming

Edward (TensorFlow)
Pyro (Uber AI Labs)
PyMC3
Stan
GPU-accelerated inference
Major developments within the past 5 years

76 of 90

Exciting New Bayesian Tools!

Math & Algorithms

Black Box Variational Inference
Gradient-based MCMC
Major developments within�the past 10 years�

Technologies & Software: Probabilistic Programming

Edward (TensorFlow)
Pyro (Uber AI Labs)
PyMC3
Stan
GPU-accelerated inference
Major developments within the past 5 years

77 of 90

Probabilistic Programming

A convenient way to write down statistical models and perform inference.

→ Therefore, a convenient way to write down testable hypotheses.

78 of 90

Bayesian Hypothesis Testing

Classical frequentist hypothesis test: “Do we reject the null hypothesis?”��

(p-values, significance levels)

vs.

79 of 90

Bayesian Hypothesis Testing

Bayesian hypothesis test: “Which hypothesis is more probable?”��

�(Goodbye, significance. Hello, Bayes factors!)

Statistical methodologies beyond p-values and significance levels...

vs.

80 of 90

Thank You

In particular:

Ron Stewart

Finn Kuusisto

David Page (BMI Dept)

The Bioinformatics Team

82 of 90

EXTRA SLIDES

83 of 90

The Scientific Method

84 of 90

The Scientific Method & Artificial Intelligence

85 of 90

The Scientific Method & Artificial Intelligence

86 of 90

The Scientific Method & Artificial Intelligence

87 of 90

The Scientific Method & Artificial Intelligence

& Machine Learning

88 of 90

The Scientific Method & Artificial Intelligence

& Machine Learning

Supervised Learning

(Regression and Classification)

89 of 90

The Scientific Method & Artificial Intelligence

& Machine Learning

Unsupervised Learning

(finding patterns in data)

90 of 90

The Scientific Method & Artificial Intelligence

& Machine Learning

Reinforcement Learning/

Active Learning

(autonomous control)