Multiple testing (issues)
Lior Pachter
California Institute of Technology
1
Lecture 14
Caltech Bi/BE/CS183
Spring 2023
These slides are distributed under the CC BY 4.0 license
Finding marker genes (Lecture 13)
2
The problem with pre-specific clusters (Lecture 13)
3
Selective inference
4
Lucy Gao
Jacob Bien
Daniela Witten
QQ plot
5
Selective inference
6
Lucy Gao
Jacob Bien
Daniela Witten
Selective inference
7
Lucy Gao
Jacob Bien
Daniela Witten
“Extra things”
8
Lucy Gao
Jacob Bien
Daniela Witten
Recall: the familywise error rate (Lecture 13)
9
Recall: The Bonferroni correction (Lecture 13)
10
The false discovery rate
11
The Benjamini-Hochberg multiple testing correction
12
q-values
13
Example
14
Example
15
The Benjamini-Yekutieli multiple testing correction
16
The Benjamini-Yekutieli multiple testing correction
17
Performing multiple testing correction in R
18
Performing multiple testing correction in Python
19
Power
20
Aggregation instead of multiple testing
21
Motivating examples
22
Motivating examples
23
Motivating examples
24
Aggregation then analysis instead of analysis then aggregation
25
Šidák aggregation
26
Šidák aggregation can fail in the collapsing scenario
27
Fisher and Lancaster’s method
28
Chi-squared distribution
29
Chi-squared goodness of fit test
30
Lancaster aggregation works well to aggregate isoform results to the gene-level
31
Summary
Selective inference: computation of p-values with respect to a null hypothesis must be taken with care to ensure that the p-values are not conditioned on the data.
Multiple testing: when considering corrections for multiple testing a choice must be made for what is being optimized (e.g. FWER vs. FDR). This choice dictates the correction procedure.
Type I and Type II error: controlling for Type I error is important but the power of a test, i.e. the extent of Type II error, must also be taken into consideration.
Aggregation: Experimental design must consider sample size, the number of tests to be performed, and the resolution of results desired. The aggregation of p-values, as done in meta-analysis, can be a useful way to control resolution.
32
Choosing a model
33
Twenty questions XIX
34
T̶w̶e̶n̶t̶y̶ ̶q̶u̶e̶s̶t̶i̶o̶n̶s̶ XX
35
Forty two questions XXI
36
Forty two questions XXII
37
Forty two questions XXIII
38
Forty two questions XXIV
39
Forty two questions XXV
40
Forty two questions XXVI
41
Forty two questions XXVII
42
Forty two questions XXVIII
43
Forty two questions XXIX
44
Additional References
45