AP STATISTICS STUDY GUIDE created by shayla huynh (*note: not everything from the course has been added yet*) | contact me if you have any questions |
HEY FOLKS. HERE’S THE ORG POST WITH SOME INFO ABOUT THE AP TEST AND MORE RESOURCES
IF YOU HAVE ANY QUESTIONS, OR WANT TO SEE THE NOTES I TOOK AND/OR EXAMPLES OF PROBLEMS, CONTACT ME
GOOD LUCK ON THE TEST. DRINK WATER AND SLEEP WELL.
SOCS | Use to describe a distribution | * A statistic is robust if it resists extreme values (When there are outliers use: median & IQR instead of mean & range) | ||
Item | About | Explanation | Sentence Structure | |
S | Shape | SHAPE
PEAKS
| The distribution of [title] is [shape] is [mode] with a peak at [value of peak(s)]. There is [# of outliers] at [value of outlier(s)] because the data point .... The center of distribution is best described by the [mean/median], [statistic value]. The spread of distribution is best described by the [range/I.Q.R.], [statistic value]. EXAMPLE The distribution of sibling data in AP Statistics is right skewed with a peak at one sibling. There is 1 outlier at 5 siblings because the data point exceeds the upper outlier fence of 3.5 siblings. The center of distribution is best described by the median, 1.5 siblings. The spread of distribution is best described by the interquartile range, 1 sibling. NOTES
| |
O | Outliers | Are there any? If so, use robust statistics. Interquartile Range
Upper Outlier Fence
Lower Outlier Fence
CALCULATOR Insert Numbers into a list
| There are three ways to talk about outliers. No Outliers
Upper Outlier Fence
Lower Outlier Fence
| |
C | Center | Would the mean or median be a better description? | If there are outliers, use median. | |
S | Spread | Would the standard deviation or I.Q.R. be the better description? | If there are outliers, use I.Q.R. | |
INTERPRETING | |||
Item | Variable | Explanation | Example |
Slope | b | For every 1 [explanatory unit] of [explanatory variable], [response variable] increases by [b value + units]. | B ≈ 0.385 For every 1 additional year of subject’s age, their glucose level increases by 0.385 units. |
Y-intercept | a | When the [explanatory variable] is zero [explanatory units], our model predicts that [response variable] will be [y-intercept value] [response units]. NOTE: Is this an extrapolation in the data set? | a ≈ 65.141 When the subject’s age is zero years, our model predicts that glucose level will be 65.141 unit |
Correlation Coefficient | r | [Explanatory Variable] and [Response Variable] have a [strength] [direction] [linear/nonlinear] relationship. NOTE:
| SLOPE = Increasing; r ≈ 0.52 Subject’s age and glucose level have a positive moderate linear relationship. |
Coefficient of Determination | r2 | [r2 converted to a percentage] of the variability in [response variable] is explained by [explanatory variable]. | r2 ≈ 0.2806 28.06% of the variability in glucose level is explained by subject’s age. |
Standard Deviation of Residuals | s | The average prediction error is [s value]. | S ≈ 10.861 The average prediction error is approximately 10.861 |
SAMPLING STRATEGIES | |||
TYPE | NAME | ABOUT | |
VALID | Simple Random Sample (SRS) | Every possible sample of a given size can be chosen | |
VALID | Stratified Random Sample | Divide population into stratas then perform a SRS in each strata proportionally | |
VALID | Cluster Sample | Divide population into clusters then randomly select a cluster. All individuals in the cluster are sampled | |
VALID | Systematic Random Sample | Choose a random location, take every nth person to be in the sample | |
INVALID | Voluntary Response Sample | Respondents choose themselves | These are invalid methods of collecting a sample from a population because it may not represent the entire population accurately. The proportion may be under or overestimated. ------------------------- On the AP test, be able to explain in context why a certain strategy would be better than another or in context why an invalid strategy is flawed. |
INVALID | Convenience Sample | Respondents are easy to sample | |
INVALID | Undercoverage Sample | Missing members of the population | |
INVALID | Nonresponse bias | Respondents refuse to answer | |
INVALID | Response bias | Respondents are “lead” to answer or respond correctly | |
INFERENCES | Central Limit Theorem (C.L.T.) - if a population distribution is not normal, the C.L.T. states that when n is large, the sampling distribution of x bar is normal. | IMPORTANT TIP: A lot of tests have the same structure/set-up to them. (I.E. Anything proportions have the same conditions, etc.) | ||
NAME | STATE | CONDITION | DO | CONCLUDE |
1 Proportion z Interval | DESCRIPTION Confidence level Context | Random: Data came from random sample Normal:
Independent: Either
| EQUATIONS DEFINITIONS p̂ = Point estimate z* = Critical value Standard Deviation =
Margin of Error =
CALCULATOR STAT ⇒ TESTS ⇒ 5A: 1-PropZInterval | I am __% confident that the population proportion of [context] is within the interval ( __ , __ ) |
1 Sample z Interval for Means | DESCRIPTION Confidence level Degrees of Freedom = n-1 Context NOTES Identifying
DESCRIPTION Confidence level Degrees of Freedom = n-1 Context NOTES Identifying between z & t
| Random: Data came from random sample Normal: For z
For t
Independent: Either
| EQUATIONS x̅ ± z*( σx̅ / √n ) σx̅ = σx / √n Delta x bar equals delta x divided by the square root of n CALCULATOR STAT ⇒ TESTS ⇒ 7: ZInterval | I am __% confident that the population mean of [context] is within the interval ( __ , __ ) |
1 Sample t Interval for Means | EQUATIONS x̅ ± t*( Sx ÷ √n ) σx = Sx / √n CALCULATOR STAT ⇒ TESTS ⇒ 8: TInterval | |||
1 Proportion z Test | DESCRIPTION Significance Level
Context HYPOTHESIS Ho: p = #. Ha: p ? #
| Random: Data came from random sample Normal:
Independent: Either
| GRAPH & EQUATIONS CALCULATOR STAT ⇒ TESTS ⇒ 5: 1-PropZTest P-VAL DISTR (2ND VARS) ⇒ DISTR ⇒ 2: normalcdf(
| I reject/fail to reject the Ho because the p-val, [value], is less/greater than α = [value]. There is sufficient/insufficient evidence to suggest [Ha in words]. |
1 Sample z Test | DESCRIPTION Significance Level
For t: Degrees of Freedom = n-1 Context HYPOTHESIS Ho: μ = #. Ha: μ ? #
NOTES Identifying between z & t
| Random: Data came from random sample Normal: For z
For t
Independent: Either
| GRAPH & EQUATIONS FOR z FOR t CALCULATOR Choose the according one STAT ⇒ TESTS ⇒ 1: Z-Test STAT ⇒ TESTS ⇒ 2: T-Test P-VAL DISTR (2ND VARS) ⇒ DISTR ⇒ 2: normalcdf(
| I reject/fail to reject the Ho because the p-val, [value], is less/greater than α = [value]. There is sufficient/insufficient evidence to suggest [Ha in words]. |
1 Sample t Test | ||||
Matched Pairs t Test | DESCRIPTION Significance Level
Context HYPOTHESIS Ho: μD = #. Ha: μD ? #
NOTES Identifying
The test is identical to 1 sample t Test except for the following:
| Random: Data came from random sample Normal:
Independent: Either
| GRAPH & EQUATIONS *NOTE: BE SURE TO ADD SUBSCRIPT D ON MU, X BAR, AND S CALCULATOR STAT ⇒ TESTS ⇒ 2: T-Test P-VAL DISTR (2ND VARS) ⇒ DISTR ⇒ 2: normalcdf(
| I reject/fail to reject the Ho because the p-val, [value], is less/greater than α = [value]. There is sufficient/insufficient evidence to suggest [Ha in words]. |
2 Proportion z Interval | DESCRIPTION Confidence Level Use “difference” Context PARAMETER Let p1 = population proportion of … p2 = population proportion of … | Random: Data came from random sample Normal:
“Both sample sizes are sufficiently large.” Independent: Either
| EQUATIONS Margin of error is everything after the ± symbol CALCULATOR STAT ⇒ TESTS ⇒ 2-PropZTest | NOTES Make sure to use “difference” I am __% confident that the difference in population proportion of [context] is within the interval ( __ , __ ) |
2 Proportion z Test | DESCRIPTION Significance Level
Use “difference” Context PARAMETER Let p1 = population proportion of … p2 = population proportion of … HYPOTHESIS Ho: p1 - p2 = 0 Ha: p1 - p2 ? 0
| Random: Data came from random sample Normal:
“Both sample sizes are sufficiently large.” Independent: Either
| GRAPH & EQUATIONS
POOLED SAMPLE STATISTIC (COMBINED) CALCULATOR STAT ⇒ TESTS ⇒ B: 2-PropZInt | I reject/fail to reject the Ho because the p-val, [value], is less/greater than α = [value]. There is sufficient/insufficient evidence to suggest [Ha in words]. |
2 Sample z Interval for Means | DESCRIPTION Confidence level For t: Degrees of Freedom =
Context PARAMETER Let μ1 = population mean __ of … μ2 = population mean __ of … NOTES Identifying between z & t
| Random: Data came from random sample Normal: (do for both n1 and n2)
Independent: Either
| EQUATION FOR z FOR t Margin of error is everything after the ± symbol CALCULATOR Choose the according one STAT ⇒ TESTS ⇒ 0: 2-SampleZInt STAT ⇒ TESTS ⇒ 0: 2-SampleTInt | I am __% confident that the population mean difference of [context] is within the interval ( __ , __ ) |
2 Sample t Interval for Means | ||||
2 Sample z Test for Means | DESCRIPTION Significance Level
For t: Degrees of Freedom =
Context PARAMETER Let μ1 = population mean __ of … μ2 = population mean __ of … HYPOTHESIS Ho: μ = 0 Ha: μ ? 0
NOTES Identifying between z & t
| Random: Data came from random sample Normal: (do for both n1 and n2)
Independent: Either
| GRAPH & EQUATION FOR z FOR t CALCULATOR Choose the according one STAT ⇒ TESTS ⇒ 3: 2-SampleZTest STAT ⇒ TESTS ⇒ 4: 2-SampleTTest | I reject/fail to reject the Ho because the p-val, [value], is less/greater than α = [value]. There is sufficient/insufficient evidence to suggest [Ha in words]. EXAMPLE I fail to reject the Ho because the p-val, 0.08, is greater than α = 0.05. There is insufficient evidence to suggest that there is a difference in mean response time between the Northern and Southern fire stations.. |
2 Sample t Test for Means | ||||
Chi Squared Goodness of Fitness Test | DESCRIPTION Significance Level
Degrees of Freedom = k - 1
Context HYPOTHESIS Ho = P1=, P2=, P3=, ... Ha = At least one Pi is incorrect | Random Normal, 10% condition when necessary Expected Counts: All expected counts exceed 5
Independence: Sample is less than 10% of its population | GRAPH x2 = ∑ [ (Observed-Expected)2 / Expected ] CALCULATOR STAT ⇒ TESTS ⇒ D: x2GOF-Test TIP: Press draw to see what the graph should look like | I reject/fail to reject the Ho because the p-val, [value], is less/greater than α = [value]. There is sufficient/insufficient evidence to suggest [Ha in words]. |
Chi Squared Test for Homogeneity | DESCRIPTION Significance Level
Degrees of Freedom =
Context HYPOTHESIS Homogeneity Ho: There is no difference in __ when __ Ha: There is a difference in __ when __ Independence Ho: There is no association between __ and __ Ha: There is an association between __ and __ NOTES Identifying
| Random Normal, 10% condition when necessary Expected Counts: All expected counts exceed 5
Independence: Sample is less than 10% of its population | GRAPH x2 = ∑ [ (Observed-Expected)2 / Expected ] CALCULATOR MATRIX (2nd x-1) ⇒ EDIT ⇒ 1: [A]
STAT ⇒ TESTS ⇒ (C) x2-Test
MATRIX (2nd x-1) ⇒ EDIT ⇒ 1: [B]
TIP: Press draw to see what the graph should look like | I reject/fail to reject the Ho because the p-val, [value], is less/greater than α = [value]. There is sufficient/insufficient evidence to suggest [Ha in words]. |
Chi Squared For Association/ Independence | ||||
Linear Regression t Interval for Slope | DESCRIPTION Confidence level Degrees of Freedom = n-2 Context | “LINEAR” Linear: No trend in the residual Independent: Normal, 10% condition when necessary Normal: Histogram of residuals appear normal Equal Variance: Same amount of scatter above and below the residual plot Random: Normal | EQUATIONS b ± t*SEb SEb = S/[Sx√(n-1) DEFINITIONS SEb = Standard Error of the Slope S = Standard deviation of the residuals CALCULATOR STAT ⇒ TESTS ⇒ G: LinRegTInt | I am __% confident that the population slope between [explanatory variable] and [response variable] is within the interval ( __ , __ ) |
Linear Regression t Test for Slope | DESCRIPTION Significance Level
Degrees of Freedom = n-2 HYPOTHESIS Ho = β = 0 Ha = β ? 0
DEFINITIONS β = Population slope (in context of regression) r = population correlation coefficient | EQUATIONS SEb = S/[Sx√(n-1) CALCULATOR STAT ⇒ TESTS ⇒ G: LinRegTTest | I reject/fail to reject the Ho because the p-val, [value], is less/greater than α = [value]. There is sufficient/insufficient evidence to suggest [Ha in words]. | |
Ho True | Ho False | P(Type I Error) = α P(Type II Error) = β Power = 1 - β | If type I error is worse, statisticians set α = 0.01 If type II error is worse, statisticians set α = 0.1 If no α is stated, set α = 0.05 | KEY CONCEPTS | |||
Reject Ho | Type I Error | Correct Conclusion |
| ||||
Fail to Reject Ho | Correct Conclusion | Type II Error |
PROBABILITY RULES | |||
NAME | ABOUT/EQUATION | NAME | ABOUT/EQUATION |
Probability of Event A | P(A) = (# of events corresponding to A) ÷ (Total # of events in the same space) | P(A|B) | Reads the probability that event A happens, given that event B happens
|
Sample Space (s) |
| Rule of addition | P(A ∪ B) = P(A) + P(B) - P(A ∩ B) |
Union | ∪
| Rule of subtraction | P(AC) = 1 - P(A) |
Intersection | ∩
| Rule of multiplication |
|
Complement | AC
| ||
RANDOM VARIABLES | |||
IDENTIFYING | |||
BINOMIAL RANDOM VARIABLE | GEOMETRIC RANDOM VARIABLE | ||
B | Binary | B | Binary |
I | Independent | I | Independent |
N | Fixed Number of Trials | T | Counting Trials Until First Success |
S | Fixed Probability of Success | S | Fixed Probability of Success |
MORE | |||
BINOMIAL RANDOM VARIABLE | GEOMETRIC RANDOM VARIABLE | ||
EQNS |
Think of pascal triangle ! = Factorial
| EQNS |
|
ON THE CALCULATOR | DISTR (2ND VARS)
| ON THE CALCULATOR | DISTR (2ND VARS)
|
Important statistic formulas: http://stattrek.com/statistics/formulas.aspx
Tbh You don’t rly need to memorize most of them. The important ones you should know are things like margin of error, expected value, etc.
AP STATISTICS STUDY GUIDE (PAGE /)