Lecture 19
P-values; A/B Testing
DATA 8
Fall 2018
Slides created by John DeNero (denero@berkeley.edu) and Ani Adhikari (adhikari@berkeley.edu)
Announcements
Statistical Significance
The GSI’s Defense
GSI’s position (Null Hypothesis):
Alternative:
Quantifying Conclusions
How big a coincidence would you have to accept, to believe the null hypothesis?
Evaluating GSI's defense hypothesis
This area is how big a coincidence
Tail Areas
Conventions About Inconsistency
(Demo)
Definition of the P-value
Formal name: observed significance level
The P-value is the chance,
Quantifying Conclusions
P(the test statistic would be equal to or more extreme� than the observed test statistic under the null hypothesis)
Evaluating Mendel's pea flower hypothesis
This area is the P-value (approximately)
An Error Probability
Can the Conclusion be Wrong?
Yes.
| Null is true | Alternative is true |
Test rejects the null | ❌ | ✅ |
Test doesn’t reject the null | ✅ | ❌ |
An Error Probability
(Demo)
Origin of the Conventions
Sir Ronald Fisher, 1890-1962
Sir Ronald Fisher, 1925
“It is convenient to take this point [5%] as a limit in judging whether a deviation is to be considered significant or not.”
–– Statistical Methods for Research Workers
Sir Ronald Fisher, 1926
“If one in twenty does not seem high enough odds, we may, if we prefer it, draw the line at one in fifty (the 2 percent point), or one in a hundred (the 1 percent point). Personally, the author prefers to set a low standard of significance at the 5 percent point …”
Review / P-values
A/B Testing
Comparing Two Samples
(Demo)
The Groups and the Question
Hypotheses
Test Statistic
Group B average - Group A average
Simulating Under the Null
...
Non-smoker
Non-smoker
Smoker
Smoker
120 oz
113 oz
128 oz
108 oz
Non-smoker
136 oz
Simulating Under the Null
...
Non-smoker
Non-smoker
Smoker
Smoker
120 oz
113 oz
128 oz
108 oz
Non-smoker
136 oz
Simulating Under the Null
(Demo)