Published using Google Docs
Summer school "Machine Learning for Non-Matrix Data" presentation
Updated automatically every 5 minutes

From supervised learning to causal inference in large dimensional settings

Gianluca Bontempi

ULB, Brussels, Belgium

Summer school "Machine Learning for Non-Matrix Data", PhD Course, Politecnico di Milano 


We are drowning in data and starving for knowledge” is an old adage of data scientists that nowadays should be rephrased into ”we are drowning in associations and starving for causality”. The democratization of machine learning software and big data platforms is increasing the risk of ascribing causal meaning to simple and sometimes brittle associations. This risk is particularly evident in settings (like bioinformatics , social sciences, economics) characterised by high dimension, multivariate interactions, dynamic behaviour where direct manipulation is not only unethical but also impractical. The conventional ways to recover a causal structure from observational data are score-based and constraint-based algorithms. Their limitations, mainly in high dimension, opened the way to alternative learning algorithms which pose the problem of causal inference as the classification of probability distributions. The rationale of those algorithms is that the existence of a causal relationship induces a constraint on the observational multivariate distribution. In other words, causality leaves footprints in the data distribution that can be hopefully used to reduce the uncertainty about the causal structure. This first part of the presentation will introduce some basics of causal inference and will discuss the state-of-the-art on machine learning for causality (notably causal feature selection) and some application to bioinformatics. The second part of the talk will focus on the D2C approach  which featurizes observed data by means of information theory asymmetric measures to extract meaningful hints about the causal structure. The D2C algorithm performs three steps to predict the existence of a directed causal link between two variables in a multivariate setting: (i) it estimates the Markov Blankets of the two variables of interest and ranks its components in terms of their causal nature, (ii) it computes a number of asymmetric descriptors and (iii) it learns a classifier (e.g. a Random Forest) returning the probability of a causal link given the descriptors value. The final part of the presentation is more prospective and will introduce some recent work to implement counterfactual prediction in a data driven setting.


  1. Introduction
  2. Probabilistic and statistical foundations
  3. Potential outcomes
  4. Graphical models
  5. Causal discovery
  6. From supervised learning to causal discovery

Recorded videos

Part I

Part II



My articles