Causal graph structure learning
Sep 25th 2025
These slides, excluding third-party material, are licensed under CC BY-NC 4.0 by Sushmita Roy and Anthony Gitter
Plan for this section
2
Readings
Plan for today
What do we mean by causality?
Uhler and Squires 2022; Luo et al., 2020
Different terminologies for causal graphs
Interventional vs Observational data
Observational
Interventional
Chevalley et al., 2023
Directed Graphical Causal Models
Glymour, C., Zhang, K., Spirtes, P., 2019.
X
Y
Interpretation of a directed edge
Glymour, C., Zhang, K., Spirtes, P., 2019.
Functional Causal Models
Glymour, C., Zhang, K., Spirtes, P., 2019.
d-separation or d-connecting
Glymour, C., Zhang, K., Spirtes, P., 2019.; Charniak 1999
d-separation notation
d-separation constructs
A
B
C
linear path/directed
A
B
C
converging/collider
A
B
C
diverging
d-separation
d-separation example
N
W
H
O
P
Consider the following graph
N
W
H
O
P
Consider W and P,
Suppose we observe H
N: Night sleep
W: Weather
H: Happiness
O: Outfit
P: Productivity
d-separation example
N
W
H
O
P
Consider the following graph
N
W
H
O
P
Consider N and W,
Suppose we observe H
N: Night sleep
W: Weather
H: Happiness
O: Outfit
P: Productivity
Causal Markov Assumption
Markov Equivalence Class
Local Markov condition example
E
B
A
Friedman 2000
D
C
Here E and A are marginally independent
Other variables are conditionally independent of a set of other variables
independence
Faithfullness
Top Hat Question
Plan for today
Classes of methods for causal graph structure discovery
The PC algorithm
Steps of the PC algorithm
Steps of the PC algorithm contd
A
B
C
A
B
C
If A and B are adjacent and C and B are adjacent, but not A and C, make a v-structure
A
B
C
A
B
C
Provided B was not in the “conditioning” set when A and C became independent.
Remarks for the PC algorithm
Limitations of constrained based approaches
Classes of methods for causal graph structure discovery
Combinatorial algorithms for causal discovery
Classes of methods for causal graph structure discovery
Continuous optimization methods
Uhler and Squires 2022
Continuous optimization: DAG NOTEARS
Vowels et al., ACM Comput Surv
Notation
Structural Equation Model (SEM)
n: # of samples
Continuous vs discrete optimization framework
Discrete optimization
Continuous optimization
The continuous optimization helps to solve the DAG learning problem efficiently.
W is the weighted adjacency matrix
F(W) is the score/loss function
G(W) is graph induced from W
h(W) check for “DAGness”
NOTEARS objective
Hadamard product
Matrix exponent
Trace
Incorporating non-linearities in the continuous optimization framework
Deep learning and causal networks
Luo et al., 2019
Plan for today
Benchmarking studies of causal structure discovery
Evaluation of causal graph structure
Empirical comparisons of different algorithms
ER, SF: Different types of network topologies. ER: Erdos Renyi; SF: Scale-free
SHD: Structural Hamming Distance
SID: Structural Intervention Distance
Lachapelle et al 2020, GraN-DAG suite
CausalBench
Chevalley, M., Roohani, Y., Mehrjou, A., Leskovec, J., Schwab, P., 2023. CausalBench: A Large-scale Benchmark for Network Inference from Single-cell Perturbation Data.
How do methods perform?
Green: methods use observational data; Pink: methods use interventional data
Methods explicitly modeling interventional data don’t outperform others
CausalBench Ranking
Take away points