1 of 45

Better Simulations for Validating Causal Discovery with the DAG-Adaptation of the Onion Method

Erich Kummerfeld

Research Assistant Professor

Institute for Health Informatics, University of Minnesota

1

2 of 45

Hi!

Some background about me:

  • PhD at CMU
    • Mixture of causal discovery and philosophy of science
    • Trained under Peter Spirtes, Clark Glymour, David Danks, etc.
  • Postdoc at UPitt, Center for Causal Discovery
    • Transition to health informatics, under Greg Cooper
  • Faculty at UMN, Institute for Health Informatics
    • Institute led by Constantin Aliferis

2

3 of 45

Causal discovery experience

  • Developing new causal methods
  • Data analysis best practices
  • Applying data analysis, including causal discovery, to specific domain problems. Mostly:
    • Addiction and alcohol
    • Aging
    • Neuroimaging
    • Psychiatry
    • Other areas

3

4 of 45

Talk structure

  1. Describe some experiences with applied work

  1. Summarize my perspective on how causal discovery fits into domain science right now

  1. Describe a methods project that targets specific weaknesses of causal discovery in modern science�
  2. Then the open discussion phase

4

5 of 45

Some examples of applications

5

How do nursing homes benefit from an on-site APRN?

6 of 45

6

What are the causes and effects of PTSD symptoms in populations with PTSD diagnosis?

7 of 45

7

What is the causal explanation for comorbid INTD and AUD?

8 of 45

8

What are the mechanisms that relate brain and behavior variables, and ultimately AUD?

9 of 45

9

How do individuals differ in terms of what causes them to drink?

10 of 45

10

How are brain networks causally connected during rest?

11 of 45

11

How is brain connectivity different during psychosis?

12 of 45

12

How do conspiracy theory beliefs relate to vaccine intentions and attitudes?

13 of 45

13

How can we improve treatment for psychosis?

14 of 45

14

What brain connectivity changes does neuromodulation cause, and what brain connectivities cause relapse?

15 of 45

Some position papers led by domain scientists promoting Causal Discovery

15

16 of 45

A podcast?!

16

17 of 45

Summary of work: everything is ad hoc

  • Many different data types, sizes, shapes
  • Many different project goals
  • Many different algorithms used
  • Many different roles for CD in the approach

  • Important lesson for anyone wanting to do applications: the graph is usually not the Primary Research Product

17

18 of 45

In most projects

The graph is merely one

Of multiple stepping stones

To the primary finding

18

19 of 45

What do domain scientists think of CD?

My impressions of applied scientists and clinicians

  • They want to answer a human understandable question about their topic
  • They don’t really care what methods are used
  • They are more worried about whether things are being measured correctly
  • To them, causal discovery is new and interesting, but it’s unclear what to do with it or if they should trust it

19

20 of 45

Trust in AI?

  • Scientist attitudes towards AI, statistics, etc. can vary wildly.
  • Most scientists have rudimentary understanding of statistics. For many, it’s just an annoying hurdle to get over for publications and grants.

  • Scientists can spin stories rapidly and support them with literature. The story is most important. For some of them AI appears to be a muse.

20

21 of 45

Passing the Statistics Gate

  • While many research scientists are fast and loose, they are still beholden to their stats experts. (Otherwise they may not do stats at all)

21

22 of 45

Validating Causal Discovery

  • “Our journal only accepts observational studies using new methods if they are externally validated.”
  • Major problem for CD: validation is lacking!
  • Supervised learning
    • holdout samples, cross validation, etc.
    • model tested in a separate population.
  • Experiments and regressions
    • confidence intervals, p-values, etc.
  • what does causal discovery have??

22

23 of 45

Decision makers need confidence

  • Clinical doctor: how can I make a medical decision based on this model if I don’t know how accurate it is? This patient’s life is on the line, and I’m liable for any mistakes.

23

24 of 45

The current reality of CD validation

  • How are causal discovery methods validated?
  • Proof of correctness. But these proofs are pointwise limit theorems. Existing proofs about finite samples make completely unrealistic assumptions.
  • Simulations. But the simulations are completely ad hoc, unrealistic, inconsistent, and cherry-picked. There is no reason to expect current simulations to indicate how real world applications will perform.

24

25 of 45

Quick shoutout to other methods

  • There are some other approaches gaining traction
    • resampling stability
    • model fit statistics (what SEM-based fields use)
    • performance on graph-informed predictive modeling
    • etc.

  • But none of these are very mature, and their limitations are unknown

25

26 of 45

One direction: improve simulations

  • Our contribution: make simulations better.
  • Simulations should
    • Generate data from a well characterized and reasonable distribution of data distributions.
    • Include possible real-world scenarios
    • NOT include simulation artifacts that don’t exist in real world data
    • NOT permit cherry-picking to make algorithms look better or worse than they are

26

27 of 45

How do existing simulation methods do on these criteria?

    • Generate data from a well characterized and reasonable distribution of data distributions.
    • Include all possible real-world scenarios
    • NOT include simulation artifacts that don’t exist in real world data
    • NOT permit cherry-picking to make algorithms look better or worse than they are

27

28 of 45

How do existing simulation methods do on these criteria?

Existing simulations are insufficient for even comparing relative performance of methods.

They are nowhere near providing evidence for expecting good performance on real data!

28

29 of 45

Our idea: sample uniformly

All of those points (and more!) can be achieved by sampling uniformly from the space of covariance matrices

  1. DAG used as input
  2. randomly sample from correlation matrices that are consistent with that DAG. This simultaneously and uniquely assigns values to all free parameters of the DAG, including edge weights and variance terms.

29

30 of 45

Shoutout to Bryan

Bryan Andrews basically did everything.

30

31 of 45

What does sampling uniformly look like?

  • [show examples from 3D plot]
  • [see right for reference graphs]��
  • Proof is in the paper (preprint is on arxiv, manuscript currently under peer review)
    • https://arxiv.org/abs/2405.13100

31

emitter

chain

collider

32 of 45

Simulated model parameters

32

ZARX: NOTEARS papers. Tetrad: BOSS paper.

33 of 45

Parameter distributions�(edges and errors)

33

ZARX: NOTEARS papers. Tetrad: BOSS paper.

34 of 45

R2 sortability?

34

ZARX: NOTEARS papers. Tetrad: BOSS paper.

35 of 45

Definition of evaluation statistics

35

36 of 45

CD methods on DaO data

36

dLiNGAM used the same models but with non-Gaussian exponential error��With standard evaluations, DaO is a difficult test for most algorithms

37 of 45

And non-DaO simulations…

37

ZARX: NOTEARS papers. Tetrad: BOSS paper.

Some algorithms that do very poorly on DaO suddenly do extremely well on non-DaO simulations

38 of 45

Going forward (1)

  • DaO should be a standard that any global causal discovery learning algorithm based on covariance matrix must be evaluated on�
  • This serves as a foundation to start empirically evaluating finite-sample performance in a way that extends to real world data

38

39 of 45

Going forward (2)

  • Other methods of evaluation are likely very important
    • Because different types of science questions require different types of method evaluation
    • e.g. does the model estimate total effects well?

  • There are many opportunities to extend DaO to better reflect more types of distributions and real world scenarios, such as latent confounding, time-series data, cyclic models, etc.

39

40 of 45

Some limitations

  • DaO currently makes no attempt to simulate specific real world situations
    • Growing list of simulation methods for specific domains, such as fMRI data, gene expression data, survey data…
  • Current evaluations on DaO depend heavily on performance on small effect sizes
    • Most real world effect sizes that scientists care about are moderate or large in size.�
  • More limitations in paper. For time, let’s move on!

40

41 of 45

Discussion Questions

  1. How should we a priori validate causal discovery algorithms to ensure that they are ready for use in real world applications where real lives are at stake?
  2. How should we post hoc quantify the uncertainty of the results of causal discovery after it has been applied to data?
  3. What types of scientific questions are causal discovery methods best suited to answer compared to other methods, and can we quantify the performance of causal discovery in answering those questions, either a priori or post hoc?
  4. When causal discovery is used as part of a larger analysis pipeline, how can we quantify the uncertainty or variability of results for the entire pipeline?

41

42 of 45

Discussion Questions

  1. How should we a priori validate causal discovery algorithms to ensure that they are ready for use in real world applications where real lives are at stake?
  2. How should we post hoc quantify the uncertainty of the results of causal discovery after it has been applied to data?
  3. What types of scientific questions are causal discovery methods best suited to answer compared to other methods, and can we quantify the performance of causal discovery in answering those questions, either a priori or post hoc?
  4. When causal discovery is used as part of a larger analysis pipeline, how can we quantify the uncertainty or variability of results for the entire pipeline?

42

43 of 45

Discussion Questions

  1. How should we a priori validate causal discovery algorithms to ensure that they are ready for use in real world applications where real lives are at stake?
  2. How should we post hoc quantify the uncertainty of the results of causal discovery after it has been applied to data?
  3. What types of scientific questions are causal discovery methods best suited to answer compared to other methods, and can we quantify the performance of causal discovery in answering those questions, either a priori or post hoc?
  4. When causal discovery is used as part of a larger analysis pipeline, how can we quantify the uncertainty or variability of results for the entire pipeline?

43

44 of 45

Discussion Questions

  1. How should we a priori validate causal discovery algorithms to ensure that they are ready for use in real world applications where real lives are at stake?
  2. How should we post hoc quantify the uncertainty of the results of causal discovery after it has been applied to data?
  3. What types of scientific questions are causal discovery methods best suited to answer compared to other methods, and can we quantify the performance of causal discovery in answering those questions, either a priori or post hoc?
  4. When causal discovery is used as part of a larger analysis pipeline, how can we quantify the uncertainty or variability of results for the entire pipeline?

44

45 of 45

45