AP STATISTICS STUDY GUIDE

created by shayla huynh (*note: not everything from the course has been added yet*)

contact me if you have any questions

EMAIL

FACEBOOK

TWITTER

INSTAGRAM

HEY FOLKS. HERE’S THE ORG POST WITH  SOME INFO ABOUT THE AP TEST AND MORE RESOURCES 

IF YOU HAVE ANY QUESTIONS, OR WANT TO SEE THE NOTES I TOOK AND/OR EXAMPLES OF PROBLEMS, CONTACT ME

GOOD LUCK ON THE TEST. DRINK WATER AND SLEEP WELL.

SOCS

Use to describe a distribution

* A statistic is robust if it resists extreme values

(When there are outliers use: median & IQR instead of mean & range)

Item

About

Explanation

Sentence Structure

S

Shape

SHAPE

  • Left skewed
  • Most of the data is on the right
  • Right skewed
  • Most of the data is on the left
  • Roughly symmetrical

PEAKS

  • Unimodal
  • Bimodal
  • Multimodal

The distribution of [title] is [shape] is [mode] with a peak at [value of peak(s)]. There is [# of outliers] at [value of outlier(s)] because the data point .... The center of distribution is best described by the [mean/median], [statistic value]. The spread of distribution is best described by the [range/I.Q.R.], [statistic value].

EXAMPLE

The distribution of sibling data in AP Statistics is right skewed with a peak at one sibling. There is 1 outlier at 5 siblings because the data point exceeds the upper outlier fence of 3.5 siblings. The center of distribution is best described by the median, 1.5 siblings. The spread of distribution is best described by the interquartile range, 1 sibling.

NOTES

  • Points are given on the AP test for mentioning whether or not the distribution is linear.
  • Always provide numbers to support your claim whenever possible.

O

Outliers

Are there any? If so, use robust statistics.

Interquartile Range

  • Q3 - Q1

Upper Outlier Fence

  • Q3 + (1.5)(I.Q.R.)

Lower Outlier Fence

  • Q1 - (1.5)(I.Q.R.)

CALCULATOR

Insert Numbers into a list

  • 1-Var Stats

There are three ways to talk about outliers.

No Outliers

  • There are no outliers. The entire data set is within the L.O.F. of [value] and U.O.F. of  [value]

Upper Outlier Fence

  • There is/are [# of outliers] at  [value of outlier(s)] because the data point exceeds the upper outlier fence of [value] 

Lower Outlier Fence

  • There is/are [# of outliers] at  [value of outlier(s)] because the data point is below the lower outlier fence of [value] 

C

Center

Would the mean or median be a better description?

If there are outliers, use median.

S

Spread

Would the standard deviation or I.Q.R. be the better description?

If there are outliers, use I.Q.R.

INTERPRETING

Item

Variable

Explanation

Example

Slope

b

For every 1 [explanatory unit] of [explanatory variable], [response variable] increases by [b value + units].

B ≈ 0.385

For every 1 additional year of subject’s age, their glucose level increases by 0.385 units.

Y-intercept

a

When the [explanatory variable] is zero [explanatory units], our model predicts that [response variable] will be [y-intercept value] [response units].

NOTE: Is this an extrapolation in the data set?

a ≈ 65.141

When the subject’s age is zero years, our model predicts that glucose level will be 65.141 unit

Correlation Coefficient

r

[Explanatory Variable] and [Response Variable] have a [strength] [direction] [linear/nonlinear] relationship.

NOTE:

  • Strength
  • Weak: |r| = 0.0 - 0.3
  • Moderate: |r| = 0.4 - 0.6
  • Strong: |r| = 0.7/+
  • Direction (Depends on Slope)
  • Positive = Increasing
  • Negative = Decreasing

SLOPE = Increasing; r ≈ 0.52

Subject’s age and glucose level have a positive moderate linear relationship.

Coefficient of Determination

r2

[r2 converted to a percentage] of the variability in [response variable] is explained by [explanatory variable].

r2 ≈ 0.2806

28.06% of the variability in glucose level is explained by subject’s age.

Standard Deviation of Residuals

s

The average prediction error is [s value].

S ≈ 10.861

The average prediction error is approximately 10.861

SAMPLING STRATEGIES

TYPE

NAME

ABOUT

VALID

Simple Random Sample (SRS)

Every possible sample of a given size can be chosen

VALID

Stratified Random Sample

Divide population into stratas then perform a SRS in each strata proportionally

VALID

Cluster Sample

Divide population into clusters then randomly select a cluster. All individuals in the cluster are sampled

VALID

Systematic Random Sample

Choose a random location, take every nth person to be in the sample

INVALID

Voluntary Response Sample

Respondents choose themselves

These are invalid methods of collecting a sample from a population because it may not represent the entire population accurately. The proportion may be under or overestimated.

-------------------------

On the AP test, be able to explain in context why a certain strategy would be better than another or in context why an invalid strategy is flawed.

INVALID

Convenience Sample

Respondents are easy to sample

INVALID

Undercoverage Sample

Missing members of the population

INVALID

Nonresponse bias

Respondents refuse to answer

INVALID

Response bias

Respondents are “lead” to answer or respond correctly

INFERENCES

Central Limit Theorem (C.L.T.) - if a population distribution is not normal, the C.L.T. states that when n is large, the sampling distribution of x bar is normal.

IMPORTANT TIP: A lot of tests have the same structure/set-up to them. (I.E. Anything proportions have the same conditions, etc.)

NAME

STATE

CONDITION

DO

CONCLUDE

1 Proportion z Interval

DESCRIPTION

Confidence level

Context

Random: Data came from random sample

Normal: 

  •  np̂ ≥ 10
  • n(1-p̂) ≥ 10

Independent: 

Either

  • Individual observations are independent when sampling without replacement
  • Sample is less than 10% of its respective population (n ≤ 0.10N)

EQUATIONS

DEFINITIONS

 = Point estimate

z* = Critical value

Standard Deviation =

  • √{ [ p̂(1-p̂) ] ÷ n }

Margin of Error =

  • z*√{ [ p̂(1-p̂) ] ÷ n }

CALCULATOR

STAT ⇒ TESTS ⇒ 5A: 1-PropZInterval

I am __% confident that the population proportion of [context] is within the interval ( __ , __ )

1 Sample z Interval for Means

DESCRIPTION

Confidence level

Degrees of Freedom = n-1

Context

NOTES

Identifying

  • σ is known (rate)

DESCRIPTION

Confidence level

Degrees of Freedom = n-1

Context

NOTES

Identifying between z & t

  • z test
  • σ is known (rare)
  • comes from the population
  • t test
  • σ is unknown (common)
  • Comes from a sample

Random: Data came from random sample

Normal: 

For z

  • n ≥ 30

For t

  • n ≥ 30
  • n ≥ 15
  • Without strong skewness or outliers
  • n < 15
  • Without strong skewness or outliers
  • Distribution is approx. normal

Independent: 

Either

  • Individual observations are independent when sampling without replacement
  • Sample is less than 10% of its respective population (n ≤ 0.10N)

EQUATIONS

x̅ ± z*( σ/ √n ) 

σ = σx / √n

Delta x bar equals delta x divided by the square root of n

CALCULATOR

STAT ⇒ TESTS ⇒ 7: ZInterval

I am __% confident that the population mean of [context] is within the interval ( __ , __ )

1 Sample t Interval for Means

EQUATIONS

x̅ ± t*( Sx ÷ √n )

σx = Sx / √n

CALCULATOR

STAT ⇒ TESTS ⇒ 8: TInterval

1 Proportion z Test

DESCRIPTION

Significance Level

  •  Use α=0.05 if none given

Context

HYPOTHESIS

Ho: p = #.  

Ha: p ? #

  • Replace ? with either ≠, >, or <

Random: Data came from random sample

Normal: 

  •  np̂ ≥ 10
  • n(1-p̂) ≥ 10

Independent: 

Either

  • Individual observations are independent when sampling without replacement
  • Sample is less than 10% of its respective population (n ≤ 0.10N)

GRAPH & EQUATIONS

IMG_2437.PNG

CALCULATOR

STAT ⇒ TESTS ⇒ 5: 1-PropZTest

P-VAL

DISTR (2ND VARS) ⇒ DISTR ⇒ 2: normalcdf(

  • Shading left: Lower = -1 x 1099
  • Shading right: Upper = 1 x 1099

I reject/fail to reject the Ho because the p-val, [value], is less/greater than α = [value]. There is sufficient/insufficient evidence to suggest [Ha in words].

1 Sample z Test

DESCRIPTION

Significance Level

  •  Use α=0.05 if none given

For t: Degrees of Freedom = n-1

Context

HYPOTHESIS

Ho: μ = #.  

Ha: μ ? #

  • Replace ? with either ≠, >, or <

NOTES

Identifying between z & t

  • z test
  • σ is known (rare)
  • comes from the population
  • t test
  • σ is unknown (common)
  • Comes from a sample

Random: Data came from random sample

Normal: 

For z

  • n ≥ 30

For t

  • n ≥ 30
  • n ≥ 15
  • Without strong skewness or outliers
  • n < 15
  • Without strong skewness or outliers
  • Distribution is approx. normal

Independent: 

Either

  • Individual observations are independent when sampling without replacement
  • Sample is less than 10% of its respective population (n ≤ 0.10N)

GRAPH & EQUATIONS

FOR z 

IMG_2438.PNG

FOR t 

IMG_2439.PNG

CALCULATOR

Choose the according one

STAT ⇒ TESTS ⇒ 1: Z-Test

STAT ⇒ TESTS ⇒ 2: T-Test

P-VAL

DISTR (2ND VARS) ⇒ DISTR ⇒ 2: normalcdf(

  • Shading left: Lower = -1 x 1099
  • Shading right: Upper = 1 x 1099

I reject/fail to reject the Ho because the p-val, [value], is less/greater than α = [value]. There is sufficient/insufficient evidence to suggest [Ha in words].

1 Sample t Test

Matched Pairs t Test

DESCRIPTION

Significance Level

  •  Use α=0.05 if none given
  • Degrees of Freedom = n-1

Context

HYPOTHESIS

Ho: μD = #.  

Ha: μD ? #

  • Replace ? with either ≠, >, or <

NOTES

Identifying

  • Two extremely similar subjects
  • Before/after on the same subjects

The test is identical to 1 sample t Test except for the following:

  • Testing against a population mean difference
  • Mark capital D subscripts on μD, D, SD

Random: Data came from random sample

Normal: 

  • n ≥ 30
  • n ≥ 15
  • Without strong skewness or outliers
  • n < 15
  • Without strong skewness or outliers
  • Distribution is approx. normal

Independent: 

Either

  • Individual observations are independent when sampling without replacement
  • Sample is less than 10% of its respective population (n ≤ 0.10N)

GRAPH & EQUATIONS

IMG_2439.PNG

*NOTE: BE SURE TO ADD SUBSCRIPT D ON MU, X BAR, AND S

CALCULATOR

STAT ⇒ TESTS ⇒ 2: T-Test

P-VAL

DISTR (2ND VARS) ⇒ DISTR ⇒ 2: normalcdf(

  • Shading left: Lower = -1 x 1099
  • Shading right: Upper = 1 x 1099

I reject/fail to reject the Ho because the p-val, [value], is less/greater than α = [value]. There is sufficient/insufficient evidence to suggest [Ha in words].

2 Proportion z Interval

DESCRIPTION

Confidence Level

Use “difference”

Context

PARAMETER

Let p1 = population proportion of …

       p2 = population proportion of …

Random: Data came from random sample

Normal: 

  • n11 ≥ 10
  • n(11-p̂1) ≥ 10
  • n22 ≥ 10
  • n2(1-p̂2) ≥ 10

“Both sample sizes are sufficiently large.”

Independent: 

Either

  • Both samples are independent
  • Both samples are less than 10% of its respective population

EQUATIONS

Margin of error is everything after the ± symbol

CALCULATOR

STAT ⇒ TESTS ⇒ 2-PropZTest

NOTES

Make sure to use “difference”

I am __% confident that the difference in population proportion of [context] is within the interval ( __ , __ )

2 Proportion z Test

DESCRIPTION

Significance Level

  •  Use α=0.05 if none given

Use “difference”

Context

PARAMETER

Let p1 = population proportion of …

       p2 = population proportion of …

HYPOTHESIS

Ho: p1 - p2 = 0  

Ha: p1 - p2 ? 0

  • Replace ? with either ≠, >, or <

Random: Data came from random sample

Normal: 

  • n11 ≥ 10
  • n(11-p̂1) ≥ 10
  • n22 ≥ 10
  • n2(1-p̂2) ≥ 10

“Both sample sizes are sufficiently large.”

Independent: 

Either

  • Both samples are independent
  • Both samples are less than 10% of its respective population

GRAPH & EQUATIONS

IMG_2441.PNG

  • In a two tail test, the difference is the same distance on each side
  • Shade left if less than
  • Shade right if greater than
  • Parameter is equal to zero

POOLED SAMPLE STATISTIC (COMBINED)

CALCULATOR

STAT ⇒ TESTS ⇒ B: 2-PropZInt

I reject/fail to reject the Ho because the p-val, [value], is less/greater than α = [value]. There is sufficient/insufficient evidence to suggest [Ha in words].

2 Sample z Interval for Means

DESCRIPTION

Confidence level

For t: 

Degrees of Freedom =

  •  Use d.f. on calculator

Context

PARAMETER

Let μ1 = population mean __  of …

       μ2 = population mean __  of …

NOTES

Identifying between z & t

  • z test
  • σ is known (rare)
  • comes from the population
  • t test
  • σ is unknown (common)
  • Comes from a sample

Random: Data came from random sample

Normal: (do for both n1 and n2)

  • n ≥ 30
  • n ≥ 15
  • Without strong skewness or outliers
  • n < 15
  • Without strong skewness or outliers
  • Distribution is approx. normal

Independent: 

Either

  • Individual observations are independent when sampling without replacement
  • Sample is less than 10% of its respective population (n ≤ 0.10N)

EQUATION

FOR z

FOR t

Margin of error is everything after the ± symbol

CALCULATOR

Choose the according one

STAT ⇒ TESTS ⇒ 0: 2-SampleZInt

STAT ⇒ TESTS ⇒ 0: 2-SampleTInt

I am __% confident that the population mean difference of [context] is within the interval ( __ , __ )

2 Sample t Interval for Means

2 Sample z Test for Means

DESCRIPTION

Significance Level

  •  Use α=0.05 if none given

For t: 

Degrees of Freedom =

  •  Use d.f. on calculator

Context

PARAMETER

Let μ1 = population mean __  of …

       μ2 = population mean __  of …

HYPOTHESIS

Ho: μ = 0

Ha: μ ? 0

  • Replace ? with either ≠, >, or <

NOTES

Identifying between z & t

  • z test
  • σ is known (rare)
  • comes from the population
  • t test
  • σ is unknown (common)
  • Comes from a sample

Random: Data came from random sample

Normal: (do for both n1 and n2)

  • n ≥ 30
  • n ≥ 15
  • Without strong skewness or outliers
  • n < 15
  • Without strong skewness or outliers
  • Distribution is approx. normal

Independent: 

Either

  • Individual observations are independent when sampling without replacement
  • Sample is less than 10% of its respective population (n ≤ 0.10N)

GRAPH & EQUATION

IMG_2442.PNG

FOR z

FOR t

CALCULATOR

Choose the according one

STAT ⇒ TESTS ⇒ 3: 2-SampleZTest

STAT ⇒ TESTS ⇒ 4: 2-SampleTTest

I reject/fail to reject the Ho because the p-val, [value], is less/greater than α = [value]. There is sufficient/insufficient evidence to suggest [Ha in words].

EXAMPLE

I fail to reject the Ho because the p-val, 0.08, is greater than α = 0.05. There is insufficient evidence to suggest that there is a difference in mean response time between the Northern and Southern fire stations..

2 Sample t Test for Means

Chi Squared Goodness of Fitness Test

DESCRIPTION

Significance Level

  •  Use α=0.05 if none stated

Degrees of Freedom = k - 1

  •  k = # of categories

Context

HYPOTHESIS

Ho = P1=, P2=, P3=, ...  

Ha = At least one Pi is incorrect

Random Normal, 10% condition when necessary

Expected Counts: All expected counts exceed 5

  •  E[x] = np
  • Always in parentheses
  • Must show expected counts in the answers

Independence: Sample is less than 10% of its population

GRAPH

x2 = ∑ [ (Observed-Expected)2 / Expected ]

CALCULATOR

STAT ⇒ TESTS ⇒ D: x2GOF-Test

TIP: Press draw to see what the graph should look like

I reject/fail to reject the Ho because the p-val, [value], is less/greater than α = [value]. There is sufficient/insufficient evidence to suggest [Ha in words].

Chi Squared Test for Homogeneity

DESCRIPTION

Significance Level

  •  Use α=0.05 if none stated

Degrees of Freedom =

  •  (# of Rows)(# of Columns)

Context

HYPOTHESIS

Homogeneity

Ho: There is no difference in  __ when  __

Ha: There is a difference in  __ when  __

Independence

Ho: There is no association between __ and __

Ha: There is an association between __ and __

NOTES

Identifying

  • Homogeneity
  • Multiple samples and treatment groups
  • Independence
  • Cut into groups

Random Normal, 10% condition when necessary

Expected Counts: All expected counts exceed 5

  •  E[x] = ( Row Total ∗ Column Total) ÷ Grand Total
  • Always in parentheses
  • Must show expected counts in the answers

Independence: Sample is less than 10% of its population

GRAPH

x2 = ∑ [ (Observed-Expected)2 / Expected ]

CALCULATOR

MATRIX (2nd x-1) ⇒ EDIT ⇒ 1: [A]

  • Adjust dimensions accordingly
  • Insert in observed count values

STAT ⇒ TESTS ⇒ (C) x2-Test

  • Observed: [A]
  • Expected: [B]

MATRIX (2nd x-1) ⇒ EDIT ⇒ 1: [B]

  • Get expected count values

TIP: Press draw to see what the graph should look like

I reject/fail to reject the Ho because the p-val, [value], is less/greater than α = [value]. There is sufficient/insufficient evidence to suggest [Ha in words].

Chi Squared For Association/ Independence 

Linear Regression t Interval for Slope

DESCRIPTION

Confidence level

Degrees of Freedom = n-2

Context

“LINEAR”

Linear: No trend in the residual

Independent: Normal, 10% condition when necessary

Normal: Histogram of residuals appear normal

Equal Variance: Same amount of scatter above and below the residual plot

Random: Normal

EQUATIONS

b ± t*SEb

SEb = S/[Sx√(n-1)

DEFINITIONS

SEb = Standard Error of the Slope

S = Standard deviation of the residuals

CALCULATOR

STAT ⇒ TESTS ⇒ G:  LinRegTInt

I am __% confident that the population slope between [explanatory variable]  and [response variable] is within the interval ( __ , __ )

Linear Regression t Test for Slope

DESCRIPTION

Significance Level

  •  Use α=0.05 if none stated

Degrees of Freedom = n-2

HYPOTHESIS

Ho = β = 0

Ha = β ? 0

  • Replace ? with either ≠, >, or <

DEFINITIONS

β = Population slope (in context of regression)

r = population correlation coefficient

EQUATIONS

IMG_2443.PNG

SEb = S/[Sx√(n-1)

CALCULATOR

STAT ⇒ TESTS ⇒ G:  LinRegTTest

I reject/fail to reject the Ho because the p-val, [value], is less/greater than α = [value]. There is sufficient/insufficient evidence to suggest [Ha in words].

Ho True

Ho False

P(Type I Error) = α

P(Type II Error) = β

Power = 1 - β

If type I error is worse, statisticians set α = 0.01

If type II error is worse, statisticians set α = 0.1

If no α is stated, set α = 0.05

KEY CONCEPTS

Reject Ho

Type I Error

Correct Conclusion

  • When α goes up, β goes down, and power goes up
  • When α goes down, β goes up, and power goes down

Fail to Reject Ho

Correct Conclusion

Type II Error

PROBABILITY RULES

NAME

ABOUT/EQUATION

NAME

ABOUT/EQUATION

Probability of Event A

P(A) = (# of events corresponding to A) ÷ (Total # of events in the same space)

P(A|B)

Reads the probability that event A happens, given that event B happens

  • | symbol means “given”

Sample Space (s)

  • P(S) = 1
  • Every outcome that can occur for a given situation

Rule of addition

P(A ∪ B) = P(A) + P(B) - P(A ∩ B)

Union

  • Means “Or”

Rule of subtraction

P(AC) = 1 - P(A)

Intersection

  • Means “And”

Rule of multiplication

  • P(A ∩ B) = P(A) ∗ P(B)
  • When A and B are independent
  • P(A ∩ B) = P(A) ∗ P(B|A)
  • P(A ∩ B) = 0
  • if event A and B are mutually exclusive

Complement

AC

  • Means “Not A”

RANDOM VARIABLES

IDENTIFYING

BINOMIAL RANDOM VARIABLE

GEOMETRIC RANDOM VARIABLE

B

Binary

B

Binary

I

Independent

I

Independent

N

Fixed Number of Trials

T

Counting Trials Until First Success

S

Fixed Probability of Success

S

Fixed Probability of Success

MORE

BINOMIAL RANDOM VARIABLE

GEOMETRIC RANDOM VARIABLE

EQNS

  • P(x=k) = (n ÷ k) pk (1-p)n-k
  • S.t. (n ÷ k) = nck = [n! ÷ k!(n-k)!)
  • μx = np
  • σx = √ [ np(1-p) ]

Think of pascal triangle

! = Factorial

  • EX: 3! = 3 Factorial = 3 ∗ 2 ∗ 1 = 6
  • EX: 5! = 5 Factorial = 5 ∗ 4 ∗ 3 ∗2 ∗ 1 = 120

EQNS

  • P(y=k) = (1-p)k-1 ∗ p
  • μy = 1/p
  • σy = √ [ (1-p) ÷ p2 ]

ON THE CALCULATOR

DISTR (2ND VARS)

  • binompdf(n,p,k) = p(x = k)
  • binomcdf(n,p,k) = p(n ≤ k)

ON THE CALCULATOR

DISTR (2ND VARS)

  • geompdf(p,k) = p(μ = k)
  • geomcdf(p,k) = p(y ≤ k)

Important statistic formulas: http://stattrek.com/statistics/formulas.aspx

Tbh You don’t rly need to memorize most of them. The important ones you should know are things like margin of error, expected value, etc.

AP STATISTICS STUDY GUIDE (PAGE /)