AP Stats inference cram guide

Statistical Inference!

Now with 33% more pages!

created by Ikusahime22 (r/APStudents) April 2018/2019 | Discord: Violet Laplace#0290

Common Confidence & Significance Levels

Confidence Level	Significance Level	Z* Value	When to Use
90%	0.10	1.645	Not very serious topic / Girl Scout cookies
95%	0.05	1.96	When in doubt, use this / Court of law
99%	0.01	2.576	Very serious topic

Errors

Error	Definition
Type I	Reject the H0 when it’s actually true
Type II	Fail to reject the H0 when it’s actually false

Other Things to Remember

Thing	What’s important about it
Confidence Intervals	*ARE NOT PROBABILITY! Confidence intervals are statements about the future - “In the long run, X% of all confidence intervals constructed in this method will contain the true parameter.”
Sx vs. σ	T-distributions are used because Sx is an estimator of the variability of the test statistic (as opposed to σ, which is exact)
Power	The formal definition of power is 1-beta (probability of a Type 2 error). You want the power of a significance test to be high. In order to increase power, increase the sample size. If you can’t, decrease alpha.
S (Standard error of the residuals)	Estimate of the variability of the prediction of y based on x
Z test vs. T test	Z test is used when σ is known. T test is used when σ is unknown. Think about normality later.
T-distributions	are robust (resistant to outliers)
Degrees of freedom (df)	For sample t-tests/intervals: n-1 For chi-square goodness of fit: # of categories - 1 For chi-square homogeneity: (r-1)(c-1) For linear regression inference: n-2 Round down if using Table D
Margin of error and sample size	“Margin of error” includes z* or t* The square root part is the standard error (standard deviation adjusted for sample size) Widths of confidence intervals are inversely proportional to the square root of the sample size (1/root(n)) For example, if the sample size increased by a factor of 9, the width of the confidence interval would be ⅓ as wide as it was before As a general rule, increasing the sample size decreases variability

List of Tests & Intervals

Method

When to Use

Conditions

Test Hypotheses

Test Conclusion

Interval Conclusion

1-Sample Z Test

1-Sample Z Interval

σ is known

counts are involved

one population

Random:

Given

Independence:

N ≥ 10n

Normality:

Given OR
Approximate normality of sampling distribution (n ≥ 30 / CLT)

H0: μ = some number

Ha: μ ≠, <, > some number

“Since a p-value of [p-value] is [greater/less] than an alpha of [significance level]...

we fail to reject the null hypothesis that [context]”

we have significant evidence to reject the null hypothesis [context] in favor of the alternative [context]

“We are [significance level]% confident that the true mean [context] is between [lower bound] and [upper bound]”

2-Sample Z Test

2-Sample Z Interval

σ is known

counts are involved

two populations (comparing two means)

Random:

Given

Independence:

Read context to determine if both samples are independent
N ≥ 10n

Normality:

Given OR
Approximate normality of sampling distributions (n ≥ 30 / CLT)

H0: μ1 = μ2 or μ1-μ2 = 0

Ha: μ1 ≠, <, > μ2 or μ1-μ2 ≠, <, > 0

“Since a p-value of [p-value] is [greater/less] than an alpha of [significance level]...

we fail to reject the null hypothesis that [context]”

we have significant evidence to reject the null hypothesis [context] in favor of the alternative [context]

“We are [significance level]% confident that the true mean [context 1] is between [lower bound] and [upper bound] [higher/lower] than the mean [context 2]”

1-Proportion Z Test

1-Proportion Z Interval

σ is known

proportions are involved

one population

Random:

Given

Independence:

N ≥ 10n

Normality:

Given OR
Approximate normality of sampling distribution (np̂ ≥ 10 and n(1-p̂) ≥ 10)

H0: P = some number

Ha: P ≠, <, > some number

“Since a p-value of [p-value] is [greater/less] than an alpha of [significance level]...

we fail to reject the null hypothesis that [context]”

we have significant evidence to reject the null hypothesis [context] in favor of the alternative [context]

We are [significance level]% confident that the true proportion of [context] is between [lower bound] and [upper bound]”

2-Proportion Z Test

2-Proportion Z Interval

σ is known

proportions are involved

two populations

Random:

Given

Independence:

Read context to determine if both samples are independent
N ≥ 10n

Normality:

Given OR
Approximate normality of sampling distributions (n1p̂1 ≥ 5, n1(1-p̂1) ≥ 5, n2p̂2 ≥ 5, and n2(1-p̂2) ≥ 5)

H0: P1 = P2 or P1-P2 = 0

Ha: P1 ≠, <, > P2 or P1-P2 ≠, <, > 0

“Since a p-value of [p-value] is [greater/less] than an alpha of [significance level]...

we fail to reject the null hypothesis that [context]”

we have significant evidence to reject the null hypothesis [context] in favor of the alternative [context]

“We are [significance level]% confident that the true proportion of [context 1] is between [lower bound] and [upper bound] [higher/lower] than the proportion of [context 2]

1-Sample T Test

1-Sample T Interval

Sx is known / σ is unknown

counts are involved

one population

Random:

Given

Independence:

N ≥ 10n

Normality:

Given OR
Approximate normality of sampling distribution (n ≥ 30 / CLT)
Approximate normality of population (n < 30, check graph)

H0: μ = some number

Ha: μ ≠, <, > some number

“Since a p-value of [p-value] is [greater/less] than an alpha of [significance level]...

we fail to reject the null hypothesis that [context]”

we have significant evidence to reject the null hypothesis [context] in favor of the alternative [context]

“We are [significance level]% confident that the true mean [context] is between [lower bound] and [upper bound]

Paired T Test

Paired T Interval

Sx of the differences is known / σ of the differences is unknown

counts are involved

one population, two treatments

Twins?
One test subject type, two treatments?

Random:

Given

*PAIRED T TESTS ARE NOT INDEPENDENT!

Normality:

Given OR
Approximate normality of the sampling distribution of the mean difference (n ≥ 30 / CLT)
Approximate normality of population of differences (n < 30, check graph)

H0: μ(diff) = some number

Ha: μ(diff) ≠, <, > some number

“Since a p-value of [p-value] is [greater/less] than an alpha of [significance level]...

we fail to reject the null hypothesis that [context]”

we have significant evidence to reject the null hypothesis [context] in favor of the alternative [context]

“We are [significance level]% confident that the true mean difference [context] is between [lower bound] and [upper bound]

2-Sample T Test

2-Sample T Interval

Sx1 and Sx2 are known / σ1 and σ2 are unknown

counts are involved

one population

Random sampling/assignment:

Given

Independence:

Read context to determine if both samples are independent

Normality:

Given OR
Approximate normality of sampling distributions (n ≥ 30 / CLT)
Approximate normality of populations (n < 30, check graph)

H0: μ1 = μ2 or μ1-μ2 = 0

Ha: μ1 ≠, <, > μ2 or μ1-μ2 ≠, <, > 0

“Since a p-value of [p-value] is [greater/less] than an alpha of [significance level]...

we fail to reject the null hypothesis that [context]”

we have significant evidence to reject the null hypothesis [context] in favor of the alternative [context]

We are [significance level]% confident that the true mean [context 1] is between [lower bound] and [upper bound] [higher/lower] than the mean [context 2]

Chi-Square: Goodness of Fit (GOF)

Extension of the 1-Proportion Z Test

One population

One row/column

Is there a significant difference between the expected and observed proportions?

Random

All expected values are greater than 1

Less than 20% of expected values are less than 5

*CHI-SQUARE DISTRIBUTIONS ARE NOT NORMAL!

H0: P1 = X, P2 = Y, P3 = Z…

Ha: There is a difference between the [observed] and the [expected] in at least one category.

“Since a p-value of [p-value] is [greater/less] than an alpha of [significance level]...

we fail to reject the null hypothesis that there is no difference [context]”

we have significant evidence to reject the null hypothesis [context] in favor of the alternative there is a difference in at least one category [context]

N/A

Chi-Square:

Homogeneity

Extension of the 2-Proportion Z Test

Multi-population

Multiple rows/columns

Is there a significant difference in at least one proportion between the categories?

Random

All expected values are greater than 1

Less than 20% of expected values are less than 5

H0: P1 = P2 = P3…

Ha: There is a difference in at least one category between [populations].

“Since a p-value of [p-value] is [greater/less] than an alpha of [significance level]...

we fail to reject the null hypothesis that there is no difference [context]”

we have significant evidence to reject the null hypothesis [context] in favor of the alternative there is a difference in at least one category [context]

N/A

Chi-Square:

Association/Independence

Is there a significant association between categorical variables?

Random

All expected values are greater than 1

Less than 20% of expected values are less than 5

H0: There is no association between [variable 1] and [variable 2].

Ha: There is an association.

“Since a p-value of [p-value] is [greater/less] than an alpha of [significance level]...

we fail to reject the null hypothesis that there is no difference [context]”

we have significant evidence to reject the null hypothesis [context] in favor of the alternative there is a difference in at least one category [context]

N/A

Linear Regression (Slope) T Test

Linear Regression (Slope) T Interval

Is there a linear relationship between quantitative variables?

Random

Observations are independent

Linearity/Residuals:

Standard deviation of y is the same about the true line (scattered)
Plot of the residuals is approximately normal

H0: The slope of the true line of regression b/t [x + context] and [y+ context] is 0.

Ha: The slope is ≠, <, > 0

“Since a p-value of [p-value] is [greater/less] than an alpha of [significance level]...

we fail to reject the null hypothesis that the slope of the true line of regression [context] is 0”

we have significant evidence to reject the null hypothesis that the slope of the true line of regression [context] is 0 in favor of the alternative that there is [a/a positive/a negative] relationship [context]”

“We are [significance level]% confident that the true slope of the regression line [context] is between [lower bound] and [upper bound]

Formula: b +/- t*SEb