1 of 66

Accelerating Recursive Partition-Based Causal Structure Learning

1Department of Computer Science and Engineering, University of Dhaka, Bangladesh

2School of Electrical and Computer Engineering, Purdue University, USA

3Department of Computer Science and Engineering, University of South Carolina, USA

Md Musfiqur Rahman1

Research Assistant

musfiq14shohan@gmail.com

@musfiq_shohan

Ayman Rasheed1

Research Assistant

aymanrasheed7

@gmail.com

Md. Mosaddek Khan1

Assistant Professor

mosaddek@du.ac.bd

Mohammad Ali Javidian2

Postdoctoral Scholar

mjavidia@purdue.edu

@ali_javidian

Pooyan Jamshidi3

Assistant Professor

pjamshid@cse.sc.edu

@PooyanJamshidi

Md. Mamun-Or-Rashid1

Professor

mamun@cse.du.ac.bd

2 of 66

Correlation does not imply causation!

2

Observational Data

Existing machine learning approaches

3 of 66

Correlation does not imply causation!

3

Observational Data

Existing machine learning approaches

Causal Inference

4 of 66

Correlation does not imply causation!

4

Observational Data

Existing machine learning approaches

Causal Inference

Incorrect decision-making!

5 of 66

Correlation does not imply causation!

5

Observational Data

Existing machine learning approaches

Causal Inference

Incorrect decision-making!

Causal structures/graphs

6 of 66

Causal Graphs and Causal Discovery

  • Causal graphs: A directed acyclic graph (DAG), G = (V,E)
    • V= {v1, v2, …, vn} is a set of variables and E is a set of directed edges.
    • Each edge u → v represents cause and effect relation

6

7 of 66

Causal Graphs and Causal Discovery

  • Causal graphs: A directed acyclic graph (DAG), G = (V,E)
    • V= {v1, v2, …, vn} is a set of variables and E is a set of directed edges.
    • Each edge u → v represents cause and effect relation

  • Causal discovery: Identifying causal relationships from large quantities of data
    • A data sample set D
    • T represents the true graph or ground truth

7

8 of 66

How we infer causal relations.

  • D-separation

8

9 of 66

How we infer causal relations.

  • D-separation

9

10 of 66

How we infer causal relations.

  • D-separation

10

11 of 66

How we infer causal relations.

  • D-separation

  • When controlled experiments are not possible.
  • Applications: 1. Gene regulatory network 2. Medical decision support systems

11

12 of 66

Recursive Causal Discovery is appropriate for large problems.

  • Constraint-based methods with conditional independence tests
  • Recursive split-and-merge strategies

12

13 of 66

Recursive Causal Discovery is appropriate for large problems.

  • Constraint-based methods with conditional independence tests
  • Recursive split-and-merge strategies

13

14 of 66

Recursive Causal Discovery is appropriate for large problems.

  • Constraint-based methods with conditional independence tests
  • Recursive split-and-merge strategies

14

15 of 66

Recursive Causal Discovery is appropriate for large problems.

  • Constraint-based methods with conditional independence tests
  • Recursive split-and-merge strategies

15

16 of 66

Recursive Causal Discovery is appropriate for large problems.

  • Constraint-based methods with conditional independence tests
  • Recursive split-and-merge strategies

16

17 of 66

Recursive Causal Discovery is appropriate for large problems.

  • Constraint-based methods with conditional independence tests
  • Recursive split-and-merge strategies

17

18 of 66

Benchmark algorithms are not perfect. Therefore, they execute refinement steps to remove undesired/redundant edges. But these refinement procedures are very expensive.

18

19 of 66

Existing approaches experience expensive refinement.

SADA

Costly partitioning with violation. Removes the less significant edges.

19

20 of 66

Existing approaches experience expensive refinement.

SADA

Costly partitioning with violation. Removes the less significant edges.

CAPA

Severe execution time overhead due to the third partition.

20

21 of 66

Existing approaches experience expensive refinement.

SADA

Costly partitioning with violation. Removes the less significant edges.

CAPA

Severe execution time overhead due to the third partition.

CP

D-separation violation. CI-tests for every pair of variables.

21

22 of 66

The Dsep-CP algorithm

22

In terms of contributions, Dsep-CP:

  • Improves the scalability without sacrificing the solution quality

23 of 66

The Dsep-CP algorithm

23

In terms of contributions, Dsep-CP:

  • Improves the scalability without sacrificing the solution quality
  • Explores d-separation violation and the graphical analysis for the false edges.

24 of 66

The Dsep-CP algorithm

24

In terms of contributions, Dsep-CP:

  • Improves the scalability without sacrificing the solution quality
  • Explores d-separation violation and the graphical analysis for the false edges.
  • Illustrates a substantial reduction in execution time up to 14% (and 86% reduction in CI-tests during refinement)

25 of 66

Preliminaries for Dsep-CP algorithm

  • Y-structure

25

26 of 66

Preliminaries for Dsep-CP algorithm

  • Y-structure
  • Independence Matrix

26

1⟂5 | {4}

27 of 66

The Dsep-CP Algorithm: The improved recursive learning

  • Find Causal Partitions (Line 4)

27

28 of 66

The Dsep-CP Algorithm: The improved recursive learning

  • Find Causal Partitions (Line 4)
  • Recursive Dsep-CP (Line 9-10)

28

29 of 66

The Dsep-CP Algorithm: The improved recursive learning

  • Find Causal Partitions (Line 4)
  • Recursive Dsep-CP (Line 9-10)
  • Dsep-CP Refinement (Line 12)

29

30 of 66

The Dsep-CP starts with a variable set and observational data

30

31 of 66

Finds efficient causal partitions to split the variable set.

31

32 of 66

Recursively solves the sub-problems with PC algorithm

32

PC algorithm is not always efficient.

33 of 66

Merges the discovered sub-graphs (May include false edges).

33

34 of 66

Detects the false edges with Y-structure.

34

Expensive!

35 of 66

Detects the false edges with Y-structure.

35

36 of 66

Detects the false edges with Y-structure.

36

37 of 66

Detects the false edges with Y-structure.

37

38 of 66

Detects the false edges with Y-structure.

38

39 of 66

Detects the false edges with Y-structure.

39

40 of 66

Detects the false edges with Y-structure.

40

41 of 66

Detects the false edges with Y-structure.

41

42 of 66

Detects the false edges with Y-structure.

42

The final discovered graph

43 of 66

Y-structures can detect false edges- Theoretically proved

43

44 of 66

Experimental Evaluation

44

  • On Simulated causal structures
  • On eight real-world causal networks
  • Source code: http://github.com/softsys4ai/Dsep-CP

45 of 66

Dsep-CP produces solutions of similar or better quality.

Experiments on Simulated Structures

45

46 of 66

Dsep-CP produces solutions of similar or better quality.

Experiments on Simulated Structures

46

47 of 66

Dsep-CP produces solutions of similar or better quality.

Experiments on Simulated Structures

47

48 of 66

Experiments on Simulated Structures

48

49 of 66

For 200-400 nodes, Dsep-CP runs 83-92% faster than SADA and 13-14% faster than CP.

Experiments on Simulated Structures

49

50 of 66

Experiments on Simulated Structures

50

51 of 66

Experiments on Simulated Structures

51

52 of 66

Reduces refining time and refining CI-tests by 98-99% compared to SADA and by 73-75% compared to CP.

Experiments on Simulated Structures

52

53 of 66

Experiments on Real-World Structures (Sample size = 500)

53

54 of 66

Experiments on Real-World Structures (Sample size = 500)

54

55 of 66

Experiments on Real-World Structures (Sample size = 500)

55

56 of 66

Experiments on Real-World Structures (Sample size = 500)

56

57 of 66

Produces solutions of similar or better quality for all eight real-world networks.

Experiments on Real-World Structures (Sample size = 500)

57

58 of 66

Experiments on Real-World Structures (Network: Mildew)

58

59 of 66

Experiments on Real-World Structures (Network: Mildew)

Also, for Large real-world networks (200+ nodes), 37- 58% faster than SADA and around 11-13% faster than CP.

59

60 of 66

Experiments on Real-World Structures (Network: Mildew)

60

61 of 66

Experiments on Real-World Structures (Network: Mildew)

61

62 of 66

Experiments on Real-World Structures (Network: Mildew)

Reducing refining time and CI-tests by 51-81% compared to SADA and by 81-86% compared to CP.

62

63 of 66

63

64 of 66

64

65 of 66

65

66 of 66

66