2 of 14

Announcements

Homework 7 is due Wednesday, 3/8
No lab notebook this week
Midterm on Friday at 7pm

Midterm Prep Guide, Past Exams
Midterm Review Session tomorrow, 3/7 6-9pm

Tutoring worksheets, walkthroughs, etc. available here!

3 of 14

Weekly Goals

Monday

Causation
Randomized Control Experiments

Wednesday (Today)

P-Value as an Error
Examples

Friday

Midterm review

4 of 14

P-Values and Error Probabilities

5 of 14

Discussion Question

There are 2000 students in Data 8. Each student tests

Null: The coin is fair

Alternative: The coin is unfair

based on 100 tosses of a coin,
the statistic | number of heads - 50 |,
and the 5% cutoff for the P-value.

Suppose all coins are fair. About how many students will conclude that their coins are unfair?

(Demo!)

6 of 14

Can the Conclusion be Wrong?

Yes.

	Null is true	Null is False
Test favors null	✅	❌
Test rejects null	❌	✅

The p-value cutoff is the probability of rejecting the null when it is actually true.

Not yet covered…power of a test.

Controll this by collecting more data

7 of 14

An Error Probability

The cutoff for the P-value is an error probability.

your cutoff is 5%
and the null hypothesis happens to be true

then there is about a 5% chance that your test will reject the null hypothesis.

8 of 14

Sir Ronald Aylmer Fisher [1890-1962] Pioneer of Modern Statistics

“It is convenient to take this point [5%] as a limit in judging whether a deviation is to be considered significant or not.” [Fisher 1925]

Choosing the �P-value Cutoff

Decide on it before seeing the results

Don’t change it!

Common values at 5% and 1%

follow conventions in your area

“If one in twenty does not seem high enough odds, we may, if we prefer it, draw the line at one in fifty (the 2 percent point), or one in a hundred (the 1 percent point). Personally, the author [Fisher] prefers to set a low standard of significance at the 5 percent point …” [Fisher 1926]

Image https://en.wikipedia.org/wiki/Ronald_Fisher

9 of 14

P-value cutoff vs P-value

P-value cutoff (You Pick It)

Does not depend on observed data or simulation
“Acceptable” probability of rejecting the null hypothesis when it is true.

P-value (You Compute It)

Depends on the observed data and simulation
Probability under the null hypothesis that the test statistic is the observed value or more extreme

10 of 14

11 of 14

Discussion Question

Manufacturers of Super Soda run a taste test
91 out of 200 tasters prefer Super Soda over its rival

Question: Do fewer people prefer Super Soda, or is this just chance?

Null hypothesis:

Alternative hypothesis:

Test statistic:

How to compute p-value:

(Demo)

Equal proportions of the population prefer Super Soda as Rival

Fewer people in the population prefer Super Soda than its Rival.

Number of people (out of 200) who prefer Super Soda

Probability of seeing 91 or fewer people who prefer Super Soda in a sample of 200 assuming the Null Hypothesis is true.

12 of 14

Guidelines for Writing Hypothesis

Null Hypothesis:

Null is meant to describe lack of an interesting pattern

Results are due to chance

Need to be able to simulate data under the null hyp.

Alternative Hypothesis:

Should align with the question of interest

Null and alternative hypothesis can’t be true at the �same time. (Reject the Null → Alternative)

13 of 14

Hypothesis Test Concerns

The outcome of a hypothesis test can be affected by:

The hypotheses you investigate: �How do you define your null distribution?
The test statistic you choose: �How do you measure a difference between samples?
The empirical distribution of the statistic under the null:�How many times do you simulate under the null distribution?
The data you collected:�Did you happen to collect a sample that is similar to the population?
The truth:�If the alternative hypothesis is true, how extreme is the difference?

14 of 14

Hypothesis Test Effects

Number of simulations:

large as possible: empirical distribution → true distribution
No new data needs to be collected (yay!)

Number of observations:

A larger sample will lead you to reject the null more reliably if the alternative is in fact true.

Difference from the null:

If truth is similar to the null hypothesis, then even a large sample may not provide enough evidence to reject the null.

(Demo)