Correlation and Causation
Lior Pachter
California Institute of Technology
1
Lecture 4
Caltech Bi/BE/CS183
Spring 2023
These slides are distributed under the CC BY 4.0 license
Stochastic gene expression in a single cell
2
Stochastic gene expression in a single cell
3
mean
How to think formally about intrinsic and extrinsic noise
4
intrinsic noise
extrinsic noise
Sample estimates of intrinsic and extrinsic noise
5
intrinsic noise
extrinsic noise
sample estimate:�(assuming n is large and c̅ = y̅)
normalized sample covariance estimate
Covariance of random variables
6
Sample covariance
7
Geometric interpretation of the sample covariance
8
less efficient for computing, more useful for understanding
Geometric interpretation of intrinsic and extrinsic noise
9
Single-cell data from Elowitz et al., 2002
10
On the matter of bias in estimators
11
?
Bessel’s correction
12
The bias - variance tradeoff
13
On the matter of bias in estimators
14
more bias, �minimal mean squared error
no bias, more mean squared error
Correlation
15
Sample correlation coefficient
16
The (Anscombe, 1973) quartet
17
An update of Anscombe’s quartet
18
Zero correlation does not mean zero structure
19
Exploratory Data Analysis (EDA)
20
Analysis by Svensson et al., 2017
Recall (from Lecture 3)
21
no noise?
linear regression lines
Spearman rank correlation coefficient
22
Spearman correlation is less sensitive than Pearson correlation to outliers
23
The coefficient of determination ( R2 )
24
residuals
Geometric interpretation of R2
25
Using correlations for network science
26
Single-cell network inference
27
Single-cell RNA-seq example (Xue et al., 2013):�Genetic programs in human and mouse early embryos revealed by single-cell RNA sequencing
28
A problem with correlation based networks
29
A
B
C
Partial correlation
30
A
A
C
B
B residuals
C residuals
calculate correlation
Computing partial correlation
31
Partial correlation of pairs of variables controlling for the rest
�where P (pij ) = Ω -1 is the precision matrix.
32
Remarks about partial correlation
33
A note about causality
34
A
B
C
A
B
C
Correlation does not imply causation
35
Regularized partial correlation
36
Network deconvolution
37
Correlation can result from causation
38
autocorrelation
Spatial autocorrelation and Moran’s I
39
slope of line is Moran’s I
Some remarks on measurement of performance
40
The confusion matrix
41
The Receiver Operator Curve (ROC)
42
Area Under the Receiver Operator Curve
43
probability that a (uniformly at) randomly selected positive item is ranked higher than a (uniformly at) randomly selected negative item
Summary
Covariance
Pearson’s correlation
Coefficient of determination
Partial correlation
44
Additional References
45