1 of 27

Error-controlled interaction discovery in generic machine learning models

Yang Lu

University of Waterloo

Scientific

discovery

Data

2 of 27

Machine learning is revolutionizing the biomedical research

2

Hypothesis

Evaluation

Data

ML Model

Hypothesis

Evaluation

Big data

Hypothesis-driven paradigm

Data-driven paradigm

3 of 27

Hypothesis generation in a data-driven paradigm

3

Prediction

Hypothesis generation

Generated disease-specific and testable hypotheses

Input: biomedical data

ML models

Output: disease

Biomarkers

Pathways

Interactions

Causalities

Lu et al. NeurIPS (2018)

Coming Soon!

Future

Work!

4 of 27

We are interested in non-additive interactions

4

Definition: A non-additive interaction cannot be decomposed into a sum of univariate functions.

 

 

Non-additivity

 

 

 

 

5 of 27

Challenges in data-driven interaction discovery

  • How to interpret models to understand interactions?

“Black-box” ML models

  • How to prioritize important hypotheses?

Overwhelming hypotheses

Many false positives

  • How to distinguish false positive hypotheses?

6 of 27

We aim to provide confidence estimation for data-driven non-additive interaction discovery

6

 

Features

Samples

 

ML models

Predict

Goal: Discover a subset of non-additive interactions (e.g., gene-gene interactions) from ML models that are likely to be relevant without too many false positives.

Interpret

Train

7 of 27

We use false discovery rate (FDR) as the confidence measure

7

Benjamini & Hochberg. Journal of the Royal statistical society: series B (1995)

# of false positive interactions

# of total accepted interactions

FDR =

E

How to estimate FDR in “black-box” ML models?

Total accepted

interactions

Score cutoff

threshold

Estimating false positives typically involves handling p-values

8 of 27

Features are quantified using interpretation methods

8

Features

Features

 

Features

Samples

ML models

Interpret

Features

Train

Popular interpretation methods:

  • Partial derivatives
  • Shapley values

Pairwise

importance

Marginal importance

9 of 27

We distill non-additive interactive effects from the reported pairwise importance

9

Features

Features

Features

 

 

 

 

 

 

 

 

 

 

 

 

Prediction-independent

feature biases

Features

Pairwise

importance

Marginal importance

Presence of features

 

 

Non-additive

interactive effect

Reported

Pairwise importance

Prediction-dependent

marginal effects

10 of 27

FDR control is obtained by using knockoffs

10

Barber and Candes. The Annals of Statistics (2015)

 

Knockoff

Generator

 

Features

Samples

 

Knockoff features

Samples

11 of 27

The knockoffs are designed to replicate the intra- and inter-correlation structure of the input

11

The knockoffs preserve the correlation structure of the input

The knockoffs preserve correlations between themselves and input

T

 

 

 

T

 

 

 

 

T

 

 

 

T

 

 

 

 

12 of 27

The intuition behind knockoff design

12

Exchangeability

The input gene and its knockoff counterpart should be:

 

Features

Samples

 

Knockoff features

Samples

Equally likely to be correlated with noise

Equally likely to be correlated with signal

Equally likely to be a false positive

13 of 27

We train and interpret ML models using both the original features and their knockoffs

13

 

 

 

ML models

Predict

Train

 

Interpret

14 of 27

FDR is estimated based on knockoff-involving interactions

14

Intuition of FDR estimation:

Important original interactions have large scores

Knockoff and irrelevant original interactions are similar in dist.

Total discoveries

False discoveries

 

# OO

# KO - # KK

 

 

 

 

Walzthoeni et al. Nature Methods (2012)

Stop at the last time when ratio is below the target FDR level

15 of 27

We evaluated our method on a a test suite of simulation functions

15

Simulation

function:

FDR

Power

16 of 27

Distilled non-additive interactive effects are important for FDR control

16

FDR

FDR

Non-additive interaction effects

Reported interaction importance

17 of 27

Distilled non-additive interactive effects exhibit similar distributions between original and knockoff interactions

17

Non-additive interaction effects

Reported interaction importance

Cumulative density

Cumulative density

18 of 27

We applied our method to a real Drosophila enhancer dataset to study the enhancer activity

18

Data:

  • Feature: 23 TFs and 13 histone modifications
  • FDR threshold q=0.2
  • Response: Binarized enhancer status per sequence
  • Identify important interactions with FDR control

Basu et al. Proceedings of the National Academy of Sciences (2018)

Task:

  • Normalized fold-enrichment values
  • Consistency with a list of well-studied TF-TF interactions in early Drosophila embryos

Evaluation:

  • 7809 sequence samples in Drosophila embryos
  • Literature support

FDR threshold

19 of 27

Distilled non-additive interactive effects support the synergy between the proteins Krueppel and Twist

19

20 of 27

Conclusion: A general pipeline for data-driven scientific discovery

  • How to interpret models to understand interactions?

“Black-box” ML models

  • How to prioritize important hypotheses?

Overwhelming hypotheses

Many false positives

  • How to distinguish false positive hypotheses?

Interpretable AI

Synergy

distillation

Knockoff-based FDR control

Impact: Pioneering effort demonstrating that interpretation of ML models could achieve statistical guarantees!

21 of 27

Questions?

22 of 27

22

23 of 27

Conclusion

23

Our method enables error-controlled interaction discovery in generic ML models without relying on p-values

Key idea:

  • Pioneering effort demonstrating that interpretation of ML models could achieve statistical guarantees.

Impact:

  • Distilling non-additive interaction effects
  • Employing knockoffs for FDR control
  • Applicable to various ML models

24 of 27

Research vision: Empower the data-driven biology by developing a hypothesis generation engine

24

Data-driven

hypothesis

generation engine

Hypotheses

Scientific

discovery

Data

Users

Modelling

  • BioMANIA [Dong et al. bioRxiv (2023)]
  • COCACOLA [Lu et al. Bioinformatics (2017)]
  • Hetero-RP [Lu et al. Nucleic Acids Research (2017a)]
  • CRAFT [Lu et al. Bioinformatics (2021)]
  • DIAmeter [Lu et al. ISMB/Bioinformatics (2021)]
  • MELT [Lu et al. Briefings in Bioinformatics (2022)]
  • SGPvalue [Lu et al. bioRxiv (2023)]
  • SONATA [Zhou et al. bioRxiv (2023)]
  • ACE [Lu et al. ICML (2021a)]
  • DANCE [Lu et al. ICML (2021b)]
  • MIOSTONE [Jiang et al. bioRxiv (2023)]
  • Ledidi [Schreiber et al. ICML Workshop on Compbio (2020)]

  • DeepPINK [Lu et al. NeurIPS (2018)]
  • DeepROCK [Chen et al. bioRxiv (2023)]
  • SGPvalue [Lu et al. bioRxiv (2023)]

25 of 27

Existing work to generate knockoffs

25

  • Romano et al. Journal of the American Statistical Association (2020)
  • Jordon et al. ICLR (2019)
  • Sudarshan et al. NeurIPS (2020)
  • Barber and Candes. The Annals of Statistics (2015)
  • Candes et al. Journal of the Royal Statistical Society: Series B (2018)
  • Vanilla knockoffs
  • Model-X knockoffs
  • KnockoffGAN
  • Deep knockoffs
  • Deep direct likelihood knockoffs

No method supports challenging settings such as generic ML models

26 of 27

We distill non-additive interactive effects from the reported pairwise importance

26

Features

Features

Features

 

 

 

 

 

 

 

 

 

 

 

 

 

 

Prediction-dependent

marginal effects

Prediction-independent

feature biases

Non-additive

interactive effect

Features

Pairwise

importance

Marginal importance

Presence of features

27 of 27

We evaluated on a a test suite of simulation functions

27

Tsang et al. International Conference on Learning Representations (2018)