1 of 31

P-values

CSCI 104: Data Science and Computing for All

Williams College�Fall 2025

2 of 31

  • Lab 6?ďż˝
  • Prelab 7 is available

Announcements

3 of 31

  • Create a hypothesis test for an example with numeric values
  • Calculate the empirical p-value
  • Use p-value cutoffs to make a conclusion about the null hypothesis.

Learning Objectives

4 of 31

Review: Null and alternative hypotheses

Null Hypothesis��Differences between observations and a population are due to random chance

Alt. Hypothesis��Differences between observations and a population are not due to random chance

Let's use simulation and statistics to reject the Null!

5 of 31

Simulation to Reject Null Hypothesis

Assume the Null Hypothesis model is valid

Simulate samples according to the Null Hypothesis

Compare to statistics for real-world observations

Consistent

The Null Hypothesis model’s assumptions are valid

�Cannot Reject Null Hypothesis

Not Consistent

The Null Hypothesis model’s assumptions are not valid

Reject Null Hypothesis and Favor Alt.

Compute statistics for simulated samples

Null Hypothesis

"Observed data happened by chance"

Alt. Hypothesis

"Some effect beyond chance"

6 of 31

Jury panels in Swain vs. Alabama

Model: Null Hypothesis

Each panelist is drawn randomly from the populationďż˝

Each panelist has 26% chance of being black

Observed Data

Swain jury: 8% black jury panelists

Simulated 100-person panels

Statistic

abs(percent black - 26)

7 of 31

What do the following represent?

  • X-axis
  • Y-axis
  • Blue bars
  • Red dot

�What conclusions can you draw from this plot?

đź’ˇThink-Pair-Share

8 of 31

Jury panels in Swain vs. Alabama

Model: Null Hypothesis

Each panelist is drawn randomly from the populationďż˝

Each panelist has 26% chance of being black

Observed Data

Swain jury: 8% black jury panelists

Is the Null Hypothesis consistent with the observed data?

Simulated 100-person panels

No!�Favors Alternative Hypothesis

Statistic

abs(percent black - 26)

9 of 31

Jury panels in Alameda, CA

Is observed TVD consistent with simulated TVDs?

Observed Data

Simulated 1,453-person�panels

Model: Null Hypothesis

Each panelist is drawn randomly from the population

ďż˝

Statistic

TVD from population��sum(absolute difference in proportion for each category)

2

No!�Favors Alternative Hypothesis

10 of 31

Review: Simulating from the null hypothesis

Repeat many times:

- Simulate one sample

- Record the sample Statistic

Analyze sample statistics for all trials

simulate_sample_statistic

simulate_sample_statistic(make_one_sample,

sample_size,

compute_sample_statistic, ďż˝ num_trials)

Measure "how close" sample is to what is expected under null hypothesis

From the null hypothesis

Functions differ depending on the data type.

11 of 31

Null Hypothesis Testing

Type of Data

Within

make_one_sample

Within compute_sample_statistic

Single Category

Null: 26% chance of �black panelist

sample_proportions(observed_sample_size,

null_proportions)

abs(sample_percent - null_percent)

Multiple Categories

Null: Alameda jury �panel ethnicities �match population

sample_proportions(observed_sample_size,

null_distribution)

tvd(observed_distribution,

null_distribution)

Numeric

ďż˝

Two groups

12 of 31

Midterm Scores for

Four Lab Sections

Section 3

Our average score is lower than everyone's...

The professor says it's just chance -- a randomly chosen group of students from the whole class could have an average like ours. Is that consistent with our observed average?

13 of 31

Was Section 3 Graded Differently?

1 row = 1 student

4 lab sections total

Section 3 Observed Data

13, 10, 20, 8, 22, ...

Statistic

?

Null Hypothesis

?ďż˝

Alt. Hypothesis

?

Section 3 Observed Data

13, 10, 20, 8, 22, ...

Statistic

abs(sample mean - population mean)

Null Hypothesis

Section 3 average could happen due to random choice of students from class.

Alt. Hypothesis

Randomness doesn't explain the midterm average.

đź’ˇThink-Pair-Share

14 of 31

1. Midterm Scores

15 of 31

Sampling with versus without replacement

Sampling with replacement

  • pail.sample(n)
  • After we take a marble, put it back in the pail before taking the next.ďż˝

Sampling without replacement

  • pail.sample(n,ďż˝ with_replacement=False)
  • After we take a marble, do not put it back in the pail (set it aside) before taking the next.

16 of 31

Null Hypothesis Testing

Type of Data

Within

make_one_sample

Within compute_sample_statistic

Single Category

Null: 26% chance of �black panelist

sample_proportions(observed_sample_size,

null_proportions)

abs(sample_percent - null_percent)

Multiple Categories

Null: Alameda jury �panel ethnicities �match population

sample_proportions(observed_sample_size,

null_distribution)

tvd(observed_distribution,

null_distribution)

Numeric�Midterm averages �in a section

population.sample(observed_sample_size, ďż˝ with_replacement=False)

abs(observed_mean - null_mean)

Two groups

17 of 31

Rank examples 1-3 from most to least consistent between observed data and null model.

2. Alameda Juries

1. Swain Jury (sample size=10)

3. Midterm scores

đź’ˇThink-Pair-Share

18 of 31

What is Statistically Significant?

Which observed values are consistent with Null Hypothesis?

Midterm

scores

Let’s (finally) turn “consistent” into a quantitative value (a p-value!)

19 of 31

Evaluate the Tail Area

Favors�Null

Favors�Alt

Start with test statistic for observed data and look toward values favoring Alt. Hypothesis.

Large Tail Area

(Yellow Area)

Observed data is consistent with Null Hypothesis

20 of 31

Evaluate the Tail Area

Favors�Null

Favors�Alt

Start with test statistic for observed data and look toward values favoring Alt. Hypothesis.

Small Tail Area

(Yellow Area)

Observed data is not consistent with Null Hypothesisďż˝

Alternative Hypothesis likely

21 of 31

p-value Definition

�The chance under the null hypothesis that the test statistic is equal to that of the observed value or even further in the direction of the alternative hypothesis.

p-value is a proportionďż˝(not a percentage)

Yellow area approximates the p-value

Favors�Null

Favors�Alt

Empirical p-values: Calculate the Tail Area

22 of 31

2. Calculating p-values

23 of 31

Simulation to Reject Null Hypothesis

Assume the Null Hypothesis model is valid

Simulate samples according to the Null Hypothesis

Compare to statistics for real-world observations

Compute statistics for simulated samples

Null Hypothesis

"Observed data happened by chance"

Alt. Hypothesis

"Some effect beyond chance"

Consistent

The Null Hypothesis model’s assumptions are valid

ďż˝

Cannot Reject Null Hypothesis

Not Consistent

The Null Hypothesis model’s assumptions are not valid

Reject Null Hypothesis and Favor Alt.

Large p-value

Small p-value

24 of 31

Inconsistency with the null hypothesis

How small must the tail be to say the observed data is inconsistent with null hypothesis?

p-value < 0.05 cutoff

(Conventionally) Observed data is

Statistically Significant

p-value < 0.01 cutoff

(Conventionally) Observed data is

Highly Statistically Significant

25 of 31

Historical notes for conventional p-value cutoff 0.05

“It is convenient to take [the 5 percent point] as a limit in judging whether a deviation is to be considered significant or not.”

—Sir Ronald Fisher, 1925

“Personally, the author prefers to set a low standard of significance at the 5 percent point …”

—Sir Ronald Fisher, 1926

Takeaway: p-value cutoffs are still matters of subjective judgement

26 of 31

Simulation to Reject Null Hypothesis (0.05 p-value cutoff)

Assume the Null Hypothesis model is valid

Simulate samples according to the Null Hypothesis

Compare to statistics for real-world observations

Compute statistics for simulated samples

Null Hypothesis

"Observed data happened by chance"

Alt. Hypothesis

"Some effect beyond chance"

Consistent

The Null Hypothesis model’s assumptions are valid

ďż˝

Cannot Reject Null Hypothesis

Not Consistent

The Null Hypothesis model’s assumptions are not valid

Reject Null Hypothesis and Favor Alt.

p-value

>= 0.05

p-value < 0.05

27 of 31

Suppose Alice and Bob each have a different observed sample.

  1. Given our 0.05 p-value cut-off would Alice reject (or fail to reject) the null hypothesis for her sample? ďż˝
  2. What about Bob?

p-value = 0.463

p-value = 0.031

Alice’s observation

Bob’s observation

đź’ˇThink-Pair-Share

28 of 31

  • Alice Fails to Reject Null: Does Alice's observation mean that the Null hypothesis is correct? ďż˝
  • Bob Rejects Null: Is Bob's rejection of the Null hypothesis ever wrong?

p-value = 0.463

p-value = 0.031

Alice’s observation

Bob’s observation

đź’ˇThink-Pair-Share

29 of 31

3. Impact of Sample Size on p-value

30 of 31

Impacts of sample size on p-value

p-value decreases as sample size increases

31 of 31

  • Create a hypothesis test for an example with numeric values
  • Calculate the empirical p-value
  • Use p-value cutoffs to make a conclusion about the null hypothesis. ďż˝
  • Next time: Permutation tests!

Learning Objectives