1 of 12

Introductory Statistics

MA207

Day 5 - Using sampling distributions as evidence

2 of 12

Understanding Sampling Distributions (#1)

A few Carroll students were discussing the prevalence of underage drinking on campus. Based on what the students see from their peers leads them to believe that roughly 50% of under age Carroll students drink. They later find out that a survey was given to 76 randomly selected under age Carroll students. Only 32 of the 76 reported having participated in under age drinking.

  • Identify the research question.
  • Null Hypothesis:
  • Alternative Hypothesis:

3 of 12

Understanding Sampling Distributions (#1)

A few Carroll students were discussing the prevalence of underage drinking on campus. Based on what the students see from their peers leads them to believe that roughly 50% of underage Carroll students drink. They later find out that a survey was given to 76 randomly selected under age Carroll students. Only 32 of the 76 reported having participated in underage drinking.

  • Let’s build a simulation so we have a better sense of how samples of size 76 behave.
  • Assumptions:
    • Each student surveyed has a 50% chance of saying “yes,” (this assumption is based on the belief above that 50% of student drink)
    • 76 randomly selected students are surveyed. If a different set of 76 random students were surveyed, we would not expect identical results

4 of 12

If each person in the population has a 50% chance of saying Yes to the drinking question, then this is modeled by spinning our 50/50 spinner 76 times.

Build this spinner and try it several times, each time the “repeat” is set at 76 people.

Be sure to plot your results.

5 of 12

Understanding Sampling Distributions

  • A sampling distribution shows us how spread out we expect randomly selected samples to be.
  • A sampling distribution gives us the power to recognize whether one sample (of size 76 in this case) is weird or typical.
  • If a sample is particularly weird, we should suspect that
    • a) It didn’t come from the population we thought it did ... [interesting]
    • b) Our null hypothesis might be wrong … [interesting]
    • c) Our participants are lying … [interesting]
    • d) The sample was unusual because variability is part of statistics … [This is possible, but it has a low probability of being the explanation]
  • If our sample is not very weird, its variability is probably just from random chance
  • Our job is to quantify “weird” in a mathematical / statistical way

6 of 12

Back to the question

A few Carroll students were discussing the prevalence of underage drinking on campus. Based on what the students see from their peers leads them to believe that roughly 50% of underage Carroll students drink. They later find out that a survey was given to 76 randomly selected under age Carroll students. Only 32 of the 76 reported having participated in underage drinking.

  • How unusual would it be to get a sample that is as low as 32 out of 76, if the drinking rate is 50%?
  • If you quantify your answer to (A) as a probability, you get the p-value for the sample. Describe what the p-value means in your own words and what conclusions you could draw from the study. (talk with a neighbor)

7 of 12

A modified scenario

At a larger state school, a study from 2005 indicated that 64% of underage students drank. The school engaged is a sustained campaign to lower the rates of drinking across campus. This year, a survey was given to 150 randomly selected underage students at the school. Only 83 of the students reported drinking (55.3%).

  • What is the null hypothesis and how do you use it in your simulator?
  • What is the alternative hypothesis and why does it not appear in your simulator?
  • If the drinking rates haven’t changed since 2005, what’s the chance of getting a sample as low (or lower) than the one in this study? i.e. What portion of your randomly generate samples were this low?

When you have a working simulation, show me. Yes, you can just tweak the other simulation for the new scenario.

8 of 12

A modified scenario

At a larger state school, a study from 2005 indicated that 64% of underage students drank. The school engaged is a sustained campaign to lower the rates of drinking across campus. This year, a survey was given to 150 randomly selected underage students at the school. Only 83 of the students reported drinking (55.3%).

  • What is the p-value?

  • Can we draw a cause / effect relationship out of this study?

9 of 12

A modified scenario

At a larger state school, a study from 2005 indicated that 65% of underage students drank. The school engaged is a sustained campaign to lower the rates of drinking across campus. This year, a survey was given to 150 randomly selected underage students at the school. Only 83 of the students reported drinking (55.3%).

  • What is the p-value?

  • Can we draw a cause / effect relationship out of this study?
    • No, because it is not a randomized controlled study.
    • If our p-value is below 0.05, we can say there is evidence that the drinking rate has lowered since 2005. But we cannot say it is because of the school campaign.

10 of 12

A new example - Accessing Health Services

At a typical 4 year college, about 25% of students are in each class. Build a sampler that has 25% Freshmen, 25% Sophomores, 25% Juniors, and 25% Seniors.

11 of 12

A new example - Accessing Health Services

At a typical 4 year college, about 25% of students are in each class. Build a sampler that has 25% Freshmen, 25% Sophomores, 25% Juniors, and 25% Seniors.

The number of students who access health services on campus should be the same across the 4 classes, but there is a concern that freshmen are less likely to seek help. In one week, out of 120 students who visited health services, only 20% were freshmen. Is this 20% in the range of normal variability, or does it represent a statistically extreme situation?

  • Build a simulation to find out how random samples of size 120 behave.
  • Create a sampling distribution and determine where the middle 95% of samples are.
  • How unusual is 20%?

12 of 12

A delicious example

The M&Ms Mars company has a machine that fills the fun sized bags of M&Ms, but due to natural variation in the machines not every bag gets exactly the same weight in M&Ms. The data in the table shows the number of bags in each weight category on a given day.

�The advertised weight of the fun size M&Ms is 13.5 grams

  • A friend buys a bag and finds that it only weighs 11.9 grams?
  • Null Hypothesis:
  • Alternative Hypothesis:
  • What is the p-value for finding a bag like your friends or more extreme? What conclusion can we make?

Weights

<12 g

12.1 to 13 g

13.1 to 14 g

14.1 to 15 g

>15 g

Frequency

23

120

357

175

18