P-values
CSCI 104: Data Science and Computing for All
Williams College�Fall 2025
Announcements
Learning Objectives
Review: Null and alternative hypotheses
Null Hypothesis��Differences between observations and a population are due to random chance
Alt. Hypothesis��Differences between observations and a population are not due to random chance
Let's use simulation and statistics to reject the Null!
Simulation to Reject Null Hypothesis
Assume the Null Hypothesis model is valid
Simulate samples according to the Null Hypothesis
Compare to statistics for real-world observations
Consistent
The Null Hypothesis model’s assumptions are valid
�Cannot Reject Null Hypothesis
Not Consistent
The Null Hypothesis model’s assumptions are not valid
Reject Null Hypothesis and Favor Alt.
Compute statistics for simulated samples
Null Hypothesis
"Observed data happened by chance"
Alt. Hypothesis
"Some effect beyond chance"
Jury panels in Swain vs. Alabama
Model: Null Hypothesis
Each panelist is drawn randomly from the populationďż˝
Each panelist has 26% chance of being black
Observed Data
Swain jury: 8% black jury panelists
Simulated 100-person panels
Statistic
abs(percent black - 26)
What do the following represent?
�What conclusions can you draw from this plot?
đź’ˇThink-Pair-Share
Jury panels in Swain vs. Alabama
Model: Null Hypothesis
Each panelist is drawn randomly from the populationďż˝
Each panelist has 26% chance of being black
Observed Data
Swain jury: 8% black jury panelists
Is the Null Hypothesis consistent with the observed data?
Simulated 100-person panels
No!�Favors Alternative Hypothesis
Statistic
abs(percent black - 26)
Jury panels in Alameda, CA
Is observed TVD consistent with simulated TVDs?
Observed Data
Simulated 1,453-person�panels
Model: Null Hypothesis
Each panelist is drawn randomly from the population
ďż˝
Statistic
TVD from population��sum(absolute difference in proportion for each category)
2
No!�Favors Alternative Hypothesis
Review: Simulating from the null hypothesis
Repeat many times:
- Simulate one sample
- Record the sample Statistic
Analyze sample statistics for all trials
simulate_sample_statistic
simulate_sample_statistic(make_one_sample,
sample_size,
compute_sample_statistic, ďż˝ num_trials)
Measure "how close" sample is to what is expected under null hypothesis
From the null hypothesis
Functions differ depending on the data type.
Null Hypothesis Testing
Type of Data | Within make_one_sample | Within compute_sample_statistic |
Single Category Null: 26% chance of �black panelist | sample_proportions(observed_sample_size, null_proportions) | abs(sample_percent - null_percent) |
Multiple Categories Null: Alameda jury �panel ethnicities �match population | sample_proportions(observed_sample_size, null_distribution) | tvd(observed_distribution, null_distribution) |
Numeric ďż˝ | | |
Two groups | | |
Midterm Scores for
Four Lab Sections
Section 3
Our average score is lower than everyone's...
The professor says it's just chance -- a randomly chosen group of students from the whole class could have an average like ours. Is that consistent with our observed average?
Was Section 3 Graded Differently?
1 row = 1 student
4 lab sections total
Section 3 Observed Data
13, 10, 20, 8, 22, ...
Statistic
?
Null Hypothesis
?ďż˝
Alt. Hypothesis
?
Section 3 Observed Data
13, 10, 20, 8, 22, ...
Statistic
abs(sample mean - population mean)
Null Hypothesis
Section 3 average could happen due to random choice of students from class.
Alt. Hypothesis
Randomness doesn't explain the midterm average.
đź’ˇThink-Pair-Share
1. Midterm Scores
Sampling with versus without replacement
Sampling with replacement
Sampling without replacement
Null Hypothesis Testing
Type of Data | Within make_one_sample | Within compute_sample_statistic |
Single Category Null: 26% chance of �black panelist | sample_proportions(observed_sample_size, null_proportions) | abs(sample_percent - null_percent) |
Multiple Categories Null: Alameda jury �panel ethnicities �match population | sample_proportions(observed_sample_size, null_distribution) | tvd(observed_distribution, null_distribution) |
Numeric�Midterm averages �in a section | population.sample(observed_sample_size, � with_replacement=False) | abs(observed_mean - null_mean) |
Two groups | | |
Rank examples 1-3 from most to least consistent between observed data and null model.
2. Alameda Juries
1. Swain Jury (sample size=10)
3. Midterm scores
đź’ˇThink-Pair-Share
What is Statistically Significant?
Which observed values are consistent with Null Hypothesis?
Midterm
scores
Let’s (finally) turn “consistent” into a quantitative value (a p-value!)
Evaluate the Tail Area
Favors�Null
Favors�Alt
Start with test statistic for observed data and look toward values favoring Alt. Hypothesis.
Large Tail Area
(Yellow Area)
Observed data is consistent with Null Hypothesis
Evaluate the Tail Area
Favors�Null
Favors�Alt
Start with test statistic for observed data and look toward values favoring Alt. Hypothesis.
Small Tail Area
(Yellow Area)
Observed data is not consistent with Null Hypothesisďż˝
Alternative Hypothesis likely
p-value Definition
�The chance under the null hypothesis that the test statistic is equal to that of the observed value or even further in the direction of the alternative hypothesis.
p-value is a proportionďż˝(not a percentage)
Yellow area approximates the p-value
Favors�Null
Favors�Alt
Empirical p-values: Calculate the Tail Area
2. Calculating p-values
Simulation to Reject Null Hypothesis
Assume the Null Hypothesis model is valid
Simulate samples according to the Null Hypothesis
Compare to statistics for real-world observations
Compute statistics for simulated samples
Null Hypothesis
"Observed data happened by chance"
Alt. Hypothesis
"Some effect beyond chance"
Consistent
The Null Hypothesis model’s assumptions are valid
ďż˝
Cannot Reject Null Hypothesis
Not Consistent
The Null Hypothesis model’s assumptions are not valid
Reject Null Hypothesis and Favor Alt.
Large p-value
Small p-value
Inconsistency with the null hypothesis
How small must the tail be to say the observed data is inconsistent with null hypothesis?
p-value < 0.05 cutoff
(Conventionally) Observed data is
Statistically Significant
p-value < 0.01 cutoff
(Conventionally) Observed data is
Highly Statistically Significant
Historical notes for conventional p-value cutoff 0.05
“It is convenient to take [the 5 percent point] as a limit in judging whether a deviation is to be considered significant or not.”
—Sir Ronald Fisher, 1925
“Personally, the author prefers to set a low standard of significance at the 5 percent point …”
—Sir Ronald Fisher, 1926
Takeaway: p-value cutoffs are still matters of subjective judgement
Simulation to Reject Null Hypothesis (0.05 p-value cutoff)
Assume the Null Hypothesis model is valid
Simulate samples according to the Null Hypothesis
Compare to statistics for real-world observations
Compute statistics for simulated samples
Null Hypothesis
"Observed data happened by chance"
Alt. Hypothesis
"Some effect beyond chance"
Consistent
The Null Hypothesis model’s assumptions are valid
ďż˝
Cannot Reject Null Hypothesis
Not Consistent
The Null Hypothesis model’s assumptions are not valid
Reject Null Hypothesis and Favor Alt.
p-value
>= 0.05
p-value < 0.05
Suppose Alice and Bob each have a different observed sample.
p-value = 0.463
p-value = 0.031
Alice’s observation
Bob’s observation
đź’ˇThink-Pair-Share
p-value = 0.463
p-value = 0.031
Alice’s observation
Bob’s observation
đź’ˇThink-Pair-Share
3. Impact of Sample Size on p-value
Impacts of sample size on p-value
p-value decreases as sample size increases
Learning Objectives