1 of 70

W03: Teaching the Investigation Process to Improve Statistical Reasoning

Todd Swanson

Hope College

swansont@hope.edu

Allan Rossman

Cal Poly – San Luis Obispo arossman@calpoly.edu

2 of 70

Acknowledgement

  • This workshop and potential follow-up activities are supported by a grant from the National Science Foundation’s IUSE (Improving Undergraduate STEM Education) program
  • Expanding and assessing the art and practice of statistical thinking (#2235355)
  • More information can be found at: https://sites.google.com/view/eaapost

3 of 70

Workshop goals

  • Help faculty to teach introductory statistics effectively in accordance with GAISE recommendations by:
    • Immersing participants in hands-on activities for introducing students to statistical concepts, particularly with regard to:
      • Statistical investigation process
      • Multivariable thinking
      • Simulation-based inference

4 of 70

GAISE

  • Guidelines for Assessment and Instruction in Statistics Education
  • Recommendations for teaching introductory statistics at college level
    • Comparable guidelines at PreK-12 level
  • Developed by American Statistical Association
    • Originally written in 2005, revised in 2016, currently under revision (S115 at 2:15pm today with Patti Lock)
  • www.amstat.org/education/gaise

5 of 70

GAISE recommendations

  1. Teach statistical thinking.
  2. Focus on conceptual understanding.
  3. Integrate real data with a context and purpose.
  4. Foster active learning.
  5. Use technology to explore concepts and analyze data.
  6. Use assessments to improve and evaluate student learning.

6 of 70

Our Schedule

  • Use simulation-based methods to compare two groups
    • Quantitative response
    • Categorical (binary) response
  • Multivariable thinking in an intro course
  • Take a short break
  • The statistical investigative process (6 steps) to compare two proportions
  • Investigating a multivariable situation

7 of 70

The six steps of the statistical investigation process

8 of 70

Comparing Two Means

Do dung beetles navigate by the stars?

9 of 70

Dung Beetles

  • Some species of dung beetles, known as “rollers,” find a pile of dung (that will be used as a food source) which they form into a ball, and then immediately roll away from the source in order to prevent other beetles from stealing it.
  • The goal is for the beetles to move the ball away as fast as possible.
  • The nocturnal African dung beetle uses the moon to help move along straighter (quicker) paths.
  • But what if the moon isn’t out, do the beetles navigate by the stars?

10 of 70

Step 1�Ask a research question

  • In “Dung Beetles Use the Milky Way for Orientation,” Current Biology, 2013, researchers report on several experiments they conducted to document whether these dung beetles use stars to navigate.
  • Our research question is:

On a dark night (no moon) are dung beetles able to navigate using stars?

  • Let’s take a closer look at their study

  • https://www.sciencedirect.com/science/article/pii/S0960982212015072

11 of 70

Step 2�Design a study and collect data

  • 18 nocturnal African dung beetles were placed on top of a dung ball at the center of a circular wooden arena (2 m in diameter)
  • The researchers timed how many seconds it took each beetle to reach the edge of the arena during a clear and moonless night.

12 of 70

Step 2�Design a study and collect data

  • Some of the beetles were randomly assigned a small, black cardboard “cap” which obscured their view of the sky but not of the edge of the platform
  • Others were given a clear transparent cap.

13 of 70

Hypotheses

  • Null hypothesis: There is no association between the type of cap the beetle is wearing and the time it takes to roll the dung ball to the edge of the arena.
  • Alternative hypothesis: There is an association between the type of cap the beetle is wearing and the time it takes to roll the dung ball to the edge of the arena such that the black-cap beetles take longer on average.

14 of 70

Parameters

  • The parameters of interest are: 
    • µblack = the long-run mean time for the beetles with black caps to roll the dung ball to the edge of the arena
    • µclear = the long-run mean time for the beetles with the clear caps to roll the dung ball to the edge of the arena

15 of 70

Parameters

  •  

16 of 70

Step 3: Explore the data

Here are all 18 times (in seconds) regardless of hat type

Mean = 86.66 sec

SD = 46.93 sec

17 of 70

  • I like to show the data not separated into groups first.
  • This way, they can see the effect of adding in an explanatory variable (like cap type) and how that might help explain some of the variation in the response (the times).
  • This also gives a way for students to see the possibility of adding in another explanatory variable.

18 of 70

Step 3: Explore the data

times are in seconds

19 of 70

Does cap type help explain variability in times?

  • Clearly the cap type helps explain some of the variability in the times.
  • The SD of all 18 times went from 46.93 sec down to 22.18 sec for just the black-cap times and 15.92 sec for just the clear-cap times.

20 of 70

R2

  • R2 is the statistic that measures the proportion of total variation in the response variable that is explained by the explanatory variable.
  • In this example, R2 = 0.843, meaning 84.3% of the variation in the beetle travel times can be explained by the cap type.

21 of 70

Step 4: Make inferences beyond the data

  • The mean for the black-capped beetles was 126.55 sec and only 42.78 sec for the clear-capped beetles, a difference of 83.77 sec.
  • While the sample mean was much larger for those wearing black caps.
  • Does this indicate a tendency?
  • Or could a higher mean just come from the random assignment?
  • Perhaps black caps were randomly assigned to slow beetles and clear caps to fast ones just by chance.

22 of 70

The Need for Inference

  • Is it possible to get a difference of 83.77 seconds if the time isn’t affected by the cap?
  • A p-value will tell how unlikely it is to get a difference so big.

23 of 70

The 3-S Strategy

  • Let’s visually go through the 3-S Strategy:

    • Statistic (calculate the observed statistic from the data)
    • Simulate (simulate statistics that could have happened if the null is true)
    • Strength of Evidence (is the observed statistic unlikely to occur if the null is true)

  • We will first look at how the researchers found their statistic

24 of 70

Black Caps

Clear Caps

They were randomly assigned to two groups where black caps were placed on 9 of them and clear caps were placed on the other 9

18 dung beetles were used in the study

25 of 70

Black Caps

Clear Caps

152.21

123.61

112.78

123.56

156.99

114.29

131.54

84.18

139.77

38.46

34.20

58.13

43.77

16.17

70.70

37.23

49.50

36.86

 

They were placed on top of a dung ball at the center of a circular arena and were timed to see how many seconds it took each beetle to reach the edge of the arena.

26 of 70

Simulate

  • If there is no association between the cap and time to reach the edge of the platform (the null hypothesis) then the times each beetle achieved would have happened regardless of the type of cap worn.
  • And getting a difference in means of 83.77 sec was just due to the random assignment of beetles to the two groups.
  • To develop statistics that could have happened if there was no association, we will simulate the same random assignment of the beetles to treatment groups that the researchers originally did but doing so with the times each beetle obtained.
  • Let’s see this happen 3 times

27 of 70

Black Caps

Clear Caps

152.21

123.61

112.78

123.56

156.99

114.29

131.54

84.18

139.77

38.46

34.20

58.13

43.77

16.17

70.70

37.23

49.50

36.86

 

1

Shuffled Differences in Means

28 of 70

Black Caps

Clear Caps

152.21

123.61

112.78

123.56

156.99

114.29

131.54

84.18

139.77

38.46

34.20

58.13

43.77

16.17

70.70

37.23

49.50

36.86

 

2

Shuffled Differences in Means

29 of 70

Black Caps

Clear Caps

152.21

123.61

112.78

123.56

156.99

114.29

131.54

84.18

139.77

38.46

34.20

58.13

43.77

16.17

70.70

37.23

49.50

36.86

 

3

Shuffled Differences in Means

30 of 70

Strength of Evidence

20.1

-18.6

-5.6

-15.2

30.0

0.5

4.6

-2.3

-2.0

-4.7

.6.9

-6.7

-10.2

-6.7

-1.2

-9.9

5.6

-1.9

12.9

1.6

1.3

4.3

2.0

10.0

0.2

3.3

6.9

Out of 30 simulated statistics, there aren’t any that are as large or larger than our observed difference in means of 83.77, hence our p-value for this null distribution is 0/30 = 0.

Shuffled Differences in Means

31 of 70

Multiple Means Applet

32 of 70

Strength of Evidence

  • Here are 1,000 simulated statistics from an applet (a null distribution).
  • We can see that our observed statistic of 83.77 sec (or larger) didn’t even occur once in 1,000 shuffles.
  • Therefore, our p-value is less than 1/1,000 or approximately 0.

33 of 70

Step 5: Formulate Conclusions

  • With a p-value of about 0, we have strong evidence that dung beetles with the black caps take longer, on average, than dung beetles with clear caps.
  • Perhaps this, along with other experiments the researchers did, shows that dung beetles use the stars to help navigate on a moonless night.

34 of 70

Step 5: Formulate Conclusions

Generalization: Was the sample randomly selected from a larger population?

  • No, the sample was not randomly sampled from a population of nocturnal African dung beetles. However, there is little reason to think that these beetles are all that different from others of the same species and location.

Causation: Were the observational units randomly assigned to treatments?

  • Yes, the observational units were presumably randomly assigned to the treatments. Therefore, because the p-value was small we can make a cause-and-effect conclusion (namely, that the black cap is causing the increase in the rolling times).

35 of 70

Step 6: Look back and ahead

  • While we have strong evidence that there is a difference, is the difference in these times impressive or meaningful?
  • The difference of 83.77 seconds is not only statistically significant, but it represents the beetles rolling nearly three times more quickly when wearing the clear cap as compared to the black cap (126.55 vs. 42.78 seconds).
  • It does seem that not only do these kinds of beetles use the stars for navigation, but it also seems to greatly helps speed them along.

36 of 70

Step 6: Look back and ahead

  • Researchers also tested the beetles on a sand arena, in various types of sky conditions, and in a planetarium.
  • Future studies might look into the implications of star navigation on dung beetle competition for resources or other behaviors, or they could explore whether similar navigational behaviors are exhibited by other species.

37 of 70

38 of 70

Comparing Two Proportions

Are metal bands used for tagging harmful to penguins?

39 of 70

Banding Penguins

  • Researchers Saraux and colleagues (Nature, 2011) reported the results of a study done to investigate this using a sample of 100 king penguins near Antarctica.
  • These penguins had already been tagged with RFID chips, and the researchers randomly assigned 50 of them to receive a metal band on their flippers in addition to the RFID chip.
  • The other 50 penguins did not receive a metal band.

  • https://www.nature.com/articles/nature09630

40 of 70

Research Question

  • The researchers thought that banding might make it more difficult for penguins to swim and thus more difficult to gather food.
  • Therefore, they thought that banding penguins reduces their survival rate.

  • Are metal bands used for tagging harmful to penguins?

41 of 70

Hypotheses

  • Null hypothesis: Banding penguins is not associated with survival.
  • Alternative hypothesis: Banding penguins reduces their chances of survival.

42 of 70

Hypotheses

  •  

43 of 70

Partial Results

  • After 4.5 years, researchers found that 47 of 100 penguins were still living.
  • If banding had no effect on penguins’ survival rate, how many of these 47 survivors do think might be banded?
  • If banding made it less likely for the penguins to survive, how many of these 47 survivors do you think might be banded?

44 of 70

Results

  •  

45 of 70

Why might a smaller proportion of banded penguins survive?

  • There are two possible explanations for an observed difference of −0.30.
    • A tendency for banded penguins to be less likely to survive (alternative hypothesis)
    • Banding has no effect on survival and just by random chance fewer healthy, younger, stronger, etc. penguins were assigned to the banded group (null hypothesis)

46 of 70

Simulate statistics

  • We can simulate values of the statistic by randomly assigning the 47 survivors and the 53 of those that died into the two groups and then recomputing the statistic.

  • Let’s see the shuffling with a smaller sample size.

 

Banded

Unbanded

Total

Survived

?

?

47

Died

?

?

53

Total

50

50

100

47 of 70

Banded Unbanded

66.7% Survived

33.3% Survived

Survived

Died

Died

Died

Died

Died

Died

Died

Died

Died

Died

Died

Died

Died

Survived

Survived

Died

Survived

Died

Survived

Survived

Survived

Survived

Survived

Survived

Survived

Survived

Survived

Survived

Survived

60.0% Survived

40.0% Survived

0.600 – 0.400 = 0.200

Difference in Simulated Proportions

48 of 70

Applet

  • Let’s see this shuffling in an applet
  • Also show normal approximation

https://www.rossmanchance.com/applets/2021/chisqshuffle/ChiSqShuffle.htm?penguins=1

49 of 70

Conclusion

    • With a p-value of about 0.002 (or a theory-based p-value of 0.0013), we have strong evidence against the null and strong evidence that a smaller proportion of banded penguins will survive after 4.5 years than unbanded penguins.

50 of 70

Generalization and Causation

  • Can we say that the banding of penguins will cause a lower survival rate?
    • Since this was a randomized experiment, and assuming everything was identical between the groups, we have strong evidence that banding is a cause
  • Can we generalize to a larger population?
    • Not a random sample but we can probably generalize the results to King penguins from the area where they were tested.

51 of 70

Multivariable Thinking �in Intro Stats

52 of 70

2016 GAISE Guidelines

  • “Give students experience with multivariable thinking.”
  • “… students need to know that multivariable modeling exists but not all aspects of how it can be utilized.”
  • “This report recommends that students be introduced to multivariable thinking, preferably early in the introductory course and not as an afterthought at the end of the course.”

52

53 of 70

Where do we include multivariable thinking in our intro course?

  • We explicitly include multivariable thinking in five places in our intro course.
    • At the very beginning when we are covering some terminology that is used to explore data and variability
    • When we discuss confounding variables in observational studies
    • When we compare two proportions
    • When we compare two means
    • When we start exploring the relationship between two quantitative variables with correlation, scatterplots, and regression
  • Multivariable thinking will come up in other places where examples naturally lead us to talk about the inclusion of other variables.

53

54 of 70

Comparing two groups (quantitative response)

  • Do those who eat breakfast tend to have higher GPAs?
  • Students enrolled in introductory statistics were asked their current college GPA and if they ate breakfast on the day the survey was conducted.

54

55 of 70

Results for their GPAs

  • n = 98
  • x̄ = 3.57
  • SD = 0.35

  • Can knowing who ate breakfast help explain some of the variation in GPA?

55

56 of 70

Comparing GPAs (Breakfast and no Breakfast)

  • Yes: n = 61, x̄ = 3.63, SD = 0.31
  • No: n = 37, x̄ = 3.47, SD = 0.41
  • R2 = 0.0469

  • There does seem to be a small but significant “breakfast effect”

56

57 of 70

Let’s add a third variable

  • With observational studies like this, we need to be concerned about confounding variables.
  • Another way to think of this is to ask if the breakfast group and no-breakfast group are different in some other way than just eating breakfast?

57

58 of 70

Let’s add a third variable

  • Is the proportion of female students the same in each?
  • Female students tend to have higher GPAs and this could be the difference.

58

59 of 70

  • In an intro class, one way to explore this is to see if there is a “breakfast effect” on GPA with just the female students and again with just the male students.

60 of 70

GPA Results for Female Students �(Breakfast and not)

  • Yes: n = 39, x̄ = 3.65, SD = 0.27
  • No: n = 19, x̄ = 3.63, SD = 0.39

60

61 of 70

GPA Results for Male Students (Breakfast or not)

  • Yes: n = 22, x̄ = 3.58, SD = 0.36
  • No: n = 18, x̄ = 3.30, SD = 0.36

61

62 of 70

Comparing Two Groups: Binary Response

  • Has seat belt usage changed over time?
  • How do seat belt laws affect use and change in use over time?

62

63 of 70

Results from National Occupant Protection Use Survey

  • In 2008, about 83.0% of the occupants wore seatbelts compared to about 89.6% in 2018
  • We can use a mosaic plot to visually compare these proportions (and sample sizes)

63

64 of 70

Do State Laws have an Impact?

  • Another question of interest is whether the type of seat belt law makes a difference
  • Some states are “primary-enforcement states” where occupants can be ticketed simply for not wearing their belts.
  • Other states are “secondary-enforcement states” where drivers must be stopped for some other violation before occupants can be cited for not wearing a seat belt.
  • New Hampshire had no seat belt enforcement for adults

64

65 of 70

Adding a Third Variable (enforcement) to our Plot

  • We can see the percentages increased from 2008 to 2018
  • More so for the nonprimary-enforcement states (75% to 86%) than the primary-enforcement states (88% to 91%).

65

66 of 70

COVID and Vaccination Status

  • Does vaccination help reduce fatality rates for those infected with COVID?

  • Data obtained from SARS-CoV-2 variants of concern and variants under investigation in England Technical briefing 20 published by Public Health England.

66

67 of 70

67

 

Unvaccinated

Vaccinated

Total

Died

253 (0.167%)

481 (0.411%)

734

Survived

150,799

116,633

267,432

Total

151,052

117,114

268,166

These cases involve the Delta variant of SARS-CoV-2 in England from Feb 1, 2021 to Aug 2, 2021

68 of 70

68

 

Unvaccinated

Vaccinated

Total

Died

48 (0.033%)

21 (0.023%)

69

Survived

147,564

89,786

237,350

Total

147,612

89,807

237,419

Less than 50 years-old

69 of 70

69

 

Unvaccinated

Vaccinated

Total

Died

205 (5.96%)

460 (1.62%)

665

Survived

3,235

27,870

31,105

Total

3,440

28,330

31,770

50 years-old or older

70 of 70

Simpson’s Paradox

  • This, of course, is an example of Simpson’s Paradox when you get a reversal of the direction of association between when data are aggregated and not.
  • This happens because the vaccinated and unvaccinated groups are different in ways other than their vaccination status. In this case, they are different in terms of age.
  • In our data: 76.7% of the vaccinated group were “young” while 97.7% of the unvaccinated group were “young”

70