Lecture 17
Comparing Distributions
DATA 8
Spring 2023
Announcements
Weekly Goals
Two Viewpoints
Model and Alternative
Steps in Assessing a Model
Discussion Questions
In each of (a) and (b), choose a statistic that will help you decide between the two viewpoints.
Data: the results of 400 tosses of a coin
(a)
(b)
“Fair”
For both (a) and (b),
Answers
(a) Large values of the percent of heads suggest “biased towards heads”
(b) Very large or very small values of the percent of heads suggest “not fair.”
Comparing Distributions
Jury Selection in Alameda County
Jury Panels
Section 197 of California's Code of Civil Procedure says, "All persons selected for jury service shall be selected at random, from a source or sources inclusive of a representative cross section of the population of the area served by the court."
Eligible jurors in a County
Jury
List of eligible residents
Jury panel
(Demo)
A New Statistic
Distance Between Distributions
(Demo)
Total Variation Distance
Every distance has a computational recipe
Total Variation Distance (TVD):
(Demo)
Summary of the Method
To assess whether a sample was drawn randomly from a known categorical distribution:
Testing Hypotheses
Testing Hypotheses
Null and Alternative
The method only works if we can simulate data under one of the hypotheses.
Test Statistic
Questions before choosing the statistic:
Prediction Under the Null Hypothesis
Conclusion of the Test
Resolve choice between null and alternative hypotheses
Whether a value is consistent with a distribution: