Statistical Inference!

Now with 33% more pages!

created by Ikusahime22 (r/APStudents) April 2018/2019 | Discord: Violet Laplace#0290

Probability Study Guide

Common Confidence & Significance Levels

Confidence Level

Significance Level

Z* Value

When to Use

90%

0.10

1.645

Not very serious topic / Girl Scout cookies

95%

0.05

1.96

When in doubt, use this / Court of law

99%

0.01

2.576

Very serious topic

Errors

Error

Definition

Type I

Reject the H0 when it’s actually true

Type II

Fail to reject the H0 when it’s actually false

Other Things to Remember

Thing

What’s important about it

Confidence Intervals

*ARE NOT PROBABILITY!

Confidence intervals are statements about the future - “In the long run, X% of all confidence intervals constructed in this method will contain the true parameter.”

Sx vs. σ

T-distributions are used because Sx is an estimator of the variability of the test statistic (as opposed to σ, which is exact)

Power

The formal definition of power is 1-beta (probability of a Type 2 error). You want the power of a significance test to be high. In order to increase power, increase the sample size. If you can’t, decrease alpha.

S (Standard error of the residuals)

Estimate of the variability of the prediction of y based on x

Z test vs. T test

Z test is used when σ is known. T test is used when σ is unknown. Think about normality later.

T-distributions

are robust (resistant to outliers)

Degrees of freedom (df)

  • For sample t-tests/intervals: n-1
  • For chi-square goodness of fit: # of categories - 1
  • For chi-square homogeneity: (r-1)(c-1)
  • For linear regression inference: n-2
  • Round down if using Table D

Margin of error and sample size

  • “Margin of error” includes z* or t*
  • The square root part is the standard error (standard deviation adjusted for sample size)
  • Widths of confidence intervals are inversely proportional to the square root of the sample size (1/root(n))
  • For example, if the sample size increased by a factor of 9, the width of the confidence interval would be ⅓ as wide as it was before
  • As a general rule, increasing the sample size decreases variability

List of Tests & Intervals

Method

When to Use

Conditions

Test Hypotheses

Test Conclusion

Interval Conclusion

1-Sample Z Test

1-Sample Z Interval

σ is known

counts are involved

one population

Random:

  • Given

Independence:

  • N ≥ 10n

Normality:

  • Given OR
  • Approximate normality of sampling distribution (n  ≥ 30 / CLT)

H0: μ = some number

Ha: μ  ≠, <, > some number

“Since a p-value of [p-value] is [greater/less] than an alpha of [significance level]...

we fail to reject the null hypothesis that [context]

OR

we have significant evidence to reject the null hypothesis [context] in favor of the alternative [context]

“We are [significance level]% confident that the true mean [context] is between [lower bound] and [upper bound]

2-Sample Z Test

2-Sample Z Interval

σ is known

counts are involved

two populations (comparing two means)

Random:

  • Given

Independence:

  • Read context to determine if both samples are independent
  • N ≥ 10n

Normality:

  • Given OR
  • Approximate normality of sampling distributions (n  ≥ 30 / CLT)

H0: μ1 = μ2 or μ1-μ2 = 0

Ha: μ1 ≠, <, > μ2 or μ1-μ2 ≠, <, > 0

“Since a p-value of [p-value] is [greater/less] than an alpha of [significance level]...

we fail to reject the null hypothesis that [context]

OR

we have significant evidence to reject the null hypothesis [context] in favor of the alternative [context]

“We are [significance level]% confident that the true mean [context 1] is between [lower bound] and [upper bound] [higher/lower] than the mean [context 2]

1-Proportion Z Test

1-Proportion Z Interval

σ is known

proportions are involved

one population

Random:

  • Given

Independence:

  • N ≥ 10n

Normality:

  • Given OR
  • Approximate normality of sampling distribution (np̂ ≥ 10 and n(1-p̂) ≥ 10)

H0: P = some number

Ha: P ≠, <, > some number

“Since a p-value of [p-value] is [greater/less] than an alpha of [significance level]...

we fail to reject the null hypothesis that [context]

OR

we have significant evidence to reject the null hypothesis [context] in favor of the alternative [context]

We are [significance level]% confident that the true proportion of [context] is between [lower bound] and [upper bound]

2-Proportion Z Test

2-Proportion Z Interval

σ is known

proportions are involved

two populations

Random:

  • Given

Independence:

  • Read context to determine if both samples are independent
  • N ≥ 10n

Normality:

  • Given OR
  • Approximate normality of sampling distributions (n1p̂1 ≥ 5, n1(1-p̂1) ≥ 5, n2p̂2 ≥ 5, and n2(1-p̂2) ≥ 5)

H0: P1 = P2 or P1-P2 = 0

Ha:  P1 ≠, <, > P2 or P1-P2 ≠, <, > 0

“Since a p-value of [p-value] is [greater/less] than an alpha of [significance level]...

we fail to reject the null hypothesis that [context]

OR

we have significant evidence to reject the null hypothesis [context] in favor of the alternative [context]

“We are [significance level]% confident that the true proportion of [context 1] is between [lower bound] and [upper bound] [higher/lower] than the proportion of [context 2] 

1-Sample T Test

1-Sample T Interval

Sx is known / σ is unknown

counts are involved

one population

Random:

  • Given

Independence:

  • N ≥ 10n

Normality:

  • Given OR
  • Approximate normality of sampling distribution (n  ≥ 30 / CLT)
  • Approximate normality of population (n < 30, check graph)

H0: μ = some number

Ha: μ  ≠, <, > some number

“Since a p-value of [p-value] is [greater/less] than an alpha of [significance level]...

we fail to reject the null hypothesis that [context]

OR

we have significant evidence to reject the null hypothesis [context] in favor of the alternative [context]

“We are [significance level]% confident that the true mean [context] is between [lower bound] and [upper bound]

Paired T Test

Paired T Interval

Sx of the differences is known / σ of the differences is unknown

counts are involved

one population, two treatments

  • Twins?
  • One test subject type, two treatments?

Random:

  • Given

*PAIRED T TESTS ARE NOT INDEPENDENT!

Normality:

  • Given OR
  • Approximate normality of the sampling distribution of the mean difference (n  ≥ 30 / CLT)
  • Approximate normality of population of differences (n < 30, check graph)

H0: μ(diff) = some number

Ha: μ(diff)  ≠, <, > some number

“Since a p-value of [p-value] is [greater/less] than an alpha of [significance level]...

we fail to reject the null hypothesis that [context]

OR

we have significant evidence to reject the null hypothesis [context] in favor of the alternative [context]

“We are [significance level]% confident that the true mean difference [context] is between [lower bound] and [upper bound]

2-Sample T Test

2-Sample T Interval

Sx1 and Sx2 are known / σ1 and σ2 are unknown

counts are involved

one population

Random sampling/assignment:

  • Given

Independence:

  • Read context to determine if both samples are independent

Normality:

  • Given OR
  • Approximate normality of sampling distributions (n  ≥ 30 / CLT)
  • Approximate normality of populations (n < 30, check graph)

H0: μ1 = μ2 or μ1-μ2 = 0

Ha: μ1 ≠, <, > μ2 or μ1-μ2 ≠, <, > 0

“Since a p-value of [p-value] is [greater/less] than an alpha of [significance level]...

we fail to reject the null hypothesis that [context]

OR

we have significant evidence to reject the null hypothesis [context] in favor of the alternative [context]

We are [significance level]% confident that the true mean [context 1] is between [lower bound] and [upper bound] [higher/lower] than the mean [context 2] 

Chi-Square: Goodness of Fit (GOF)

Extension of the 1-Proportion Z Test

One population

One row/column

Is there a significant difference between the expected and observed proportions?

Random

All expected values are greater than 1

Less than 20% of expected values are less than 5

*CHI-SQUARE DISTRIBUTIONS ARE NOT NORMAL!

H0: P1 = X, P2 = Y, P3 = Z…

Ha: There is a difference between the [observed] and the [expected] in at least one category.

“Since a p-value of [p-value] is [greater/less] than an alpha of [significance level]...

we fail to reject the null hypothesis that there is no difference [context]

OR

we have significant evidence to reject the null hypothesis [context] in favor of the alternative there is a difference in at least one category [context]

N/A

Chi-Square:

Homogeneity

Extension of the 2-Proportion Z Test

Multi-population

Multiple rows/columns

Is there a significant difference in at least one proportion between the categories?

Random

All expected values are greater than 1

Less than 20% of expected values are less than 5

H0: P1 = P2 = P3…

Ha: There is a difference in at least one category between [populations].

“Since a p-value of [p-value] is [greater/less] than an alpha of [significance level]...

we fail to reject the null hypothesis that there is no difference [context]

OR

we have significant evidence to reject the null hypothesis [context] in favor of the alternative there is a difference in at least one category [context]

N/A

Chi-Square:

Association/Independence

Is there a significant association between categorical variables?

Random

All expected values are greater than 1

Less than 20% of expected values are less than 5

H0: There is no association between [variable 1] and [variable 2].

Ha: There is an association.

“Since a p-value of [p-value] is [greater/less] than an alpha of [significance level]...

we fail to reject the null hypothesis that there is no difference [context]

OR

we have significant evidence to reject the null hypothesis [context] in favor of the alternative there is a difference in at least one category [context]

N/A

Linear Regression (Slope) T Test

Linear Regression (Slope) T Interval

Is there a linear relationship between quantitative variables?

Random

Observations are independent

Linearity/Residuals:

  • Standard deviation of y is the same about the true line (scattered)
  • Plot of the residuals is approximately normal

H0: The slope of the true line of regression b/t [x + context] and [y+ context] is 0.

Ha: The slope is ≠, <, > 0

“Since a p-value of [p-value] is [greater/less] than an alpha of [significance level]...

we fail to reject the null hypothesis that the slope of the true line of regression [context] is 0”

OR

we have significant evidence to reject the null hypothesis that the slope of the true line of regression [context] is 0 in favor of the alternative that there is [a/a positive/a negative] relationship [context]

“We are [significance level]% confident that the true slope of the regression line [context] is between [lower bound] and [upper bound]

Formula: b +/- t*SEb