1 of 13

Discussion of Anne’s OCIS Talk��“What are we discovering? Two perspectives on�interpretable evaluation of causal discovery algorithms”

Vanessa Didelez

Leibniz Institute for Prevention Research and Epidemiology – BIPS

Faculty of Mathematics and Computer Science, University of Bremen, Germany

OCIS – January 2025

2 of 13

?? Causal DAGs ??

Quoting Dominik Janzing (keynote lecture, UAI, 2024)�� “All DAGs are wrong, but some are useful”

“causal discovery… not only that the results are often wrong even worse, we rarely know whether they are wrong and even worse, we rarely understand what ‘wrong’ means

2

3 of 13

Validation in Causal Inference

Ultimate validation: Carry out the relevant intervention(s) and check if your causal claims hold up

  • Usually, in very many contexts, not possible - or meaningful
    • If not meaningful, then why aim at causal DAG?

Choose evaluation to match purpose of analysis

  • Causal DAG: must intervene on many nodes to check whole DAG

3

4 of 13

Validation in Causal Inference

In simulations: what is the baseline?

  • Anne: “random guessing is lowest(?) bar”
    • Easy to do: always include random guess in simulations!
    • Needed: surge of new (untransparent?) methods
    • Wanted: some sort of test of “better than random guessing”
    • But Anne’s test requires knowledge of truth
    • NB: other than Erdös-Renyi type random guess?

Real-world ground truth very rarely available

  • Anne: interactively “compare with experts’ (consensus) DAG”

4

5 of 13

Random Guessing

  • Expectation under random �guessing
  • Can do this for undirected �graphs (skeletons)
  • Anne: wants distribution

5

6 of 13

Random Guessing

  • Expectation under random �guessing
  • Can do this for undirected �graphs (skeletons)
  • Anne: wants distribution
  • And: other measures, �also on DAGs!

Random guess (proportion true)

6

7 of 13

Random Guessing

Anne: “can be viewed as negative control concept”

I disagree:

  • “Negative controls” are about detecting / quantifying bias
  • “Random guessing” is about a �H0: ~ some DAG does not use information in the data

Two different things

7

8 of 13

What is Random Guessing ?

  • Evaluate user-given DAG against data: needs “random” baseline
  • Idea: construct random draws by node permutations
  • Can also be used to evaluate causal discovery algorithms
    • e.g., on Sachs data, NOTEARS and CAM were not doing better than random draws

8

9 of 13

Caution

Evaluation against some form of random guessing:

Still only about “statistical” model fit…

… not about the causal nature

“Causality” needs evaluation under (something like) interventions

    • changing circumstances, distribution shifts, natural experiments etc.

9

10 of 13

Expert- & Data-Driven DAGs

10

11 of 13

Expert Constructed DAGs

In my experience, expert knowledge often does not come in form of individual directed edges

  • Single directed edge = direct causal effect relative to other nodes
  • “Conditioning” on other nodes is challenging
    • past studies in literature may not have used same set of variables
  • Expert knowledge: “A causes Z ” – direct or indirect effect
    • but often directions also unclear
  • Other issues: overlooking non-nodes and non-edges
  • Also: “confirmation bias”

11

12 of 13

Combing Experts & Data for DAGs

Various proposals exist in literature

  • Not seen much practical use, e.g., in epidemiology
  • Perhaps because expert knowledge comes in “unwieldy” forms

Important challenge:

  • Combining data-driven DAG construction with typical and different types of expert knowledge
    • e.g., ancestral, partial, varying degrees of uncertainty, etc.

12

13 of 13

Thanks for the nice paper!�And thanks for the attention!

Vanessa Didelez

didelez@leibniz-bips.de

Contact

www.leibniz-bips.de/en

Leibniz Institute for Prevention Research and Epidemiology – BIPS

Achterstraße 30

D-28359 Bremen