1 of 24

AP Statistics Unit 4

Designing Studies

2 of 24

Agenda

01

Sampling and Surveys

02

Experiments

03

Practice

3 of 24

Sampling and Surveys

1

4 of 24

Definitions

Population: the entire group of individuals we want information about

Sample: a subset of individuals from which we actually collect data

A census collects data from every individual in the population.

Bias: statistics that don't provide an accurate representation of the population

The design of a statistical study shows bias if it would consistently underestimate or overestimate the value you want to know.

5 of 24

Sampling Methods

BAD - will ALWAYS produce bias

GOOD

Convenience Sample

Simple Random Sampling

Voluntary Response Sample

Stratified Random Sampling

Cluster Sampling

Systematic Sampling

6 of 24

Bad Sampling Methods

Convenience Sample: Sample selected by taking from the population individuals that are easy to reach

  • often produce unrepresentative data

Voluntary Response Sample: People decide whether to join a sample by responding to a general invitation. The sample selects itself.

  • leads to bias because the people who choose to respond would have strong feelings about the issue, and they often share the same opinion

7 of 24

Simple Random Sampling (SRS)

Sample chosen in such a way that every group of n individuals in the population has an equal chance to be selected as the sample

Examples: picking a name out of a hat; give each individual a number or use a random digit table or technology to choose a number

8 of 24

Stratified Random Sampling

Steps

  1. Divide population into strata
    1. strata should be based on a variable that would affect the response (ex: gender, age, breed, grade level)
  2. Take SRS of each strata to make a sample.

*should ideally make estimate more precise*

A stratified sample should be “some individuals from all groups”

Groups are “similar within, different between”

9 of 24

Cluster Sampling

Steps

  1. Classify the population into groups of individuals that are located near each other, called clusters.
    1. clusters already exist
    2. ideally, the clusters should represent the population
  2. Take 1 SRS to choose a cluster. All individuals from the cluster become the sample

A cluster sample should be “all individuals from some clusters”

Clusters are “different within, similar between”

10 of 24

Systematic Sampling

A type of probability sampling method in which sample members from a larger population are selected according to a random starting point and a fixed, periodic interval

Example: picking every 10th individual

11 of 24

Errors in Sampling

  • A larger sample is almost always better than a smaller sample - there is less variability
  • Repeated samples will give different results. There is no guarantee that we are getting the exact population parameter
  • Even good sampling methods have problems
    • Undercoverage
      • sampling method is designed so certain groups cannot be selected
    • Nonresponse
      • the person chosen for the sample does not respond
    • Response errors
      • the person chosen for the sample lies
      • EX: sensitive topics

12 of 24

Experiments

2

13 of 24

Observational Study

vs

Experiment

14 of 24

Observational Study

A study that observes individuals and measures variables of interest but does not attempt to influence the responses

Ex: sample survey, watching behavior of animals or relationships between people

Goals of an Observational Study

  • to describe some group or situation
  • to compare groups
  • to examine relationships between variables

*not a good way to observe the effect that changes in one variable have on another variable*

15 of 24

Experiment

A study in which researchers deliberately impose treatments on individuals to measure their responses

Purpose of an Experiment

  • to determine whether the treatment causes a change in the response

*when our goal is to understand cause and effect, experiments are the only source of fully convincing data*

*if it is not an experiment, we cannot determine cause and effect*

16 of 24

Confounding

when two variables are associated in such a way that their effects on a response variable cannot be distinguished from each other

17 of 24

Test 200 volunteers (100 men and 100 women).

Find that lack of exercise leads to weight gain.

You really can’t say for sure whether lack of exercise leads to weight gain.

One confounding variable is how much people eat. It’s also possible that men eat more than women; this could also make sex a confounding variable.

Ex

18 of 24

If asked to identify confounding variables…

  1. explain how it’s associated with the explanatory
  2. explain how it affects the response variable

Example: “Students who take an outside tutoring course may have been forced by their parents. These students may be scared to let their parents down, therefore scoring better on the SAT”

19 of 24

Notes

  • the confounding variable should affect the entire group

  • well designed experiments take steps to control confounding; observational studies sometimes cannot

20 of 24

4 Principles of Experiment Design

Comparison

  • Use a design that compares two or more treatments

Control

  • Keep other variables that might affect the response the same for all groups
  • control EVERYTHING ELSE

21 of 24

4 Principles of Experiment Design

Random Assignment

  • Use chance to assign experimental units to treatments. This prevents bias and creates roughly equivalent groups of experimental units by balancing the effects of other variables among the treatment groups

22 of 24

4 Principles of Experiment Design

Replication

  • Use enough experimental units in each group so that any differences in the effects of the treatments can be distinguished from chance differences between the groups
  • there are enough experimental units to make a good conclusion

23 of 24

Practice

3

24 of 24

Linkys