Hypothesis Testing
To save and make a local (editable) copy, do: File, Make a copy. �
Advanced High School Statistics
Slides developed by Mine Çetinkaya-Rundel of OpenIntro, modified by Leah Dorazio for use with AHSS.
The slides may be copied, edited, and/or shared via the CC BY-SA license
Some images may be included under fair use guidelines (educational purposes)
Remember when...
p̂males = 21 / 24 = 0.88
p̂females = 14 / 24 = 0.58
Possible explanations:
Result
Since it was quite unlikely to obtain results like the actual data or something more extreme in the simulations (male promotions being 30% or more higher than female promotions), we decided to reject the null hypothesis in favor of the alternative.
Recap: hypothesis testing framework
We start with a null hypothesis (H0) that represents the status quo.
We also have an alternative hypothesis (HA) that represents our research question, i.e. what we're testing for.
We conduct a hypothesis test under the assumption that the null hypothesis is true, either via simulation or traditional methods based on the central limit theorem (coming up next...).
If the test results suggest that the data do not provide convincing evidence for the alternative hypothesis, we stick with the null hypothesis. If they do, then we reject the null hypothesis in favor of the alternative.
We'll formally introduce the hypothesis testing framework using an example on testing a claim about a population mean.
Testing hypotheses using confidence intervals
The associated hypotheses are:
H0: µ = 3: College students have been in 3 exclusive relationships, on average
HA: µ > 3: College students have been in more than 3 exclusive relationships, on average
Earlier we calculated a 95% confidence interval for the average number of exclusive relationships college students have been in to be (2.7, 3.7). Based on this confidence interval, do these data support the hypothesis that college students on average have been in more than 3 exclusive relationships.
Number of college applications
A similar survey asked how many colleges students applied to, and 206 students responded to this question. This sample yielded an average of 9.7 college applications with a standard deviation of 7. College Board website states that counselors recommend students apply to roughly 8 colleges. Do these data provide convincing evidence that the average number of colleges all Duke students apply to is higher than recommended?
Number of college applications - conditions
Which of the following is not a condition that needs to be met to proceed with this hypothesis test?
Number of college applications - conditions
Which of the following is not a condition that needs to be met to proceed with this hypothesis test?
p-values
Number of college applications - p-value
P(x̄ > 9.7 | µ = 8) = P(Z > 3.4) = 0.0003
p-value: probability of observing data at least as favorable to HA as our current data set (a sample mean greater than 9.7), if in fact H0 were true (the true population mean was 8).
Number of college applications - Making a decision
p-value = 0.0003
Since p-value is low (lower than 5%) we reject H0.
The data provide convincing evidence that Duke students apply to more than 8 schools on average.
The difference between the null value of 8 schools and observed sample mean of 9.7 schools is not due to chance or sampling variability.
Practice
A poll by the National Sleep Foundation found that college students average about 7 hours of sleep per night. A sample of 169 college students taking an introductory statistics class yielded an average of 6.88 hours, with a standard deviation of 0.94 hours. Assuming that this is a random sample representative of all college students (bit of a leap of faith?), a hypothesis test was conducted to evaluate if college students on average sleep less than 7 hours per night. The p-value for this hypothesis test is 0.0485. Which of the following is correct?
Practice
A poll by the National Sleep Foundation found that college students average about 7 hours of sleep per night. A sample of 169 college students taking an introductory statistics class yielded an average of 6.88 hours, with a standard deviation of 0.94 hours. Assuming that this is a random sample representative of all college students (bit of a leap of faith?), a hypothesis test was conducted to evaluate if college students on average sleep less than 7 hours per night. The p-value for this hypothesis test is 0.0485. Which of the following is correct?
Practice
A poll by the National Sleep Foundation found that college students average about 7 hours of sleep per night. A sample of 169 college students taking an introductory statistics class yielded an average of 6.88 hours, with a standard deviation of 0.94 hours. Assuming that this is a random sample representative of all college students (bit of a leap of faith?), a hypothesis test was conducted to evaluate if college students on average sleep less than 7 hours per night. The p-value for this hypothesis test is 0.0485. Which of the following is correct?
Two-sided hypothesis testing with p-values
Hence the p-value would change as well:
If the research question was “Do the data provide convincing evidence that the average amount of sleep college students get per night is different than the national average?”, the alternative hypothesis would be different.
H0: µ = 7
HA: µ ≠ 7
Decision errors
Hypothesis tests are not flawless.
Decision errors (cont.)
There are two competing hypotheses: the null and the alternative. In a hypothesis test, we make a decision about which might be true, but our choice might be incorrect.
Decision errors (cont.)
A Type 1 Error is rejecting the null hypothesis when H0 is true.
There are two competing hypotheses: the null and the alternative. In a hypothesis test, we make a decision about which might be true, but our choice might be incorrect.
Decision errors (cont.)
We (almost) never know if H0 or HA is true, but we need to consider all possibilities.
A Type 1 Error is rejecting the null hypothesis when H0 is true.
A Type 2 Error is failing to reject the null hypothesis when HA is true.
There are two competing hypotheses: the null and the alternative. In a hypothesis test, we make a decision about which might be true, but our choice might be incorrect.
Hypothesis Test as a trial
If we again think of a hypothesis test as a criminal trial then it makes sense to frame the verdict in terms of the null and alternative hypotheses:
H0: Defendant is innocent
HA: Defendant is guilty
Which type of error is being committed in the following circumstances?
Declaring the defendant innocent when they are actually guilty
Declaring the defendant guilty when they are actually innocent
Hypothesis Test as a trial
If we again think of a hypothesis test as a criminal trial then it makes sense to frame the verdict in terms of the null and alternative hypotheses:
H0: Defendant is innocent
HA: Defendant is guilty
Which type of error is being committed in the following circumstances?
Declaring the defendant innocent when they are actually guilty
Type 2 error
Declaring the defendant guilty when they are actually innocent
Hypothesis Test as a trial
If we again think of a hypothesis test as a criminal trial then it makes sense to frame the verdict in terms of the null and alternative hypotheses:
H0: Defendant is innocent
HA: Defendant is guilty
Which type of error is being committed in the following circumstances?
Declaring the defendant innocent when they are actually guilty
Type 2 error
Declaring the defendant guilty when they are actually innocent
Type 1 error
Which error do you think is the worse error to make?
“better that ten guilty persons escape than that one innocent suffer”�- William Blackstone
Type 1 error rate
As a general rule we reject H0 when the p-value is less than 0.05, i.e. we use a significance level of 0.05, α = 0.05.
This means that, for those cases where H0 is actually true, we do not want to incorrectly reject it more than 5% of those times.
In other words, when using a 5% significance level there is about 5% chance of making a Type 1 error if the null hypothesis is true.� P(Type 1 error | H0 true) = α
This is why we prefer small values of α -- increasing α increases the Type 1 error rate.
Choosing a significance level
Choosing a significance level for a test is important in many contexts, and the traditional level is 0.05. However, it is often helpful to adjust the significance level based on the application.
We may select a level that is smaller or larger than 0.05 depending on the consequences of any conclusions reached from the test.
If making a Type 1 Error is dangerous or especially costly, we should choose a small significance level (e.g. 0.01). Under this scenario we want to be very cautious about rejecting the null hypothesis, so we demand very strong evidence favoring HA before we would reject H0.
If a Type 2 Error is relatively more dangerous or much more costly than a Type 1 Error, then we should choose a higher significance level (e.g. 0.10). Here we want to be cautious about failing to reject H0 when the null is actually false.
Recap: Hypothesis testing framework
1. Set the hypotheses. � For a single proportion this will look like:� H0: p = null value� HA: p < or > or ≠ null value
2. Check assumptions and conditions
3. Calculate a test statistic and a p-value
4. Make a decision, and interpret it in context
Explore more free resources at openintro.org/ahss, including:
Teachers only content is also available for Verified Teachers, including
Questions? Contact us.