1 of 40

Multivariable Thinking

Todd Swanson

Hope College

swansont@hope.edu

2 of 40

My Tasks

  • Looking at association between two categorical variables, then adding a third variable and talking about confounding
  • Categorical variables in mosaic plots
  • Adding a categorical variable to simple linear regression

3 of 40

Association and Confounding

Do COVID vaccines work?

3

4 of 40

Introduction

  • On Dec 8, 2020, 90-year-old Margaret Keenan from England was the first person in the world to receive a COVID-19 vaccine outside of a clinical trial.
  • By June 18, 2021, all adults in England were eligible to receive first dose
  • Second doses were available anywhere from 3 to 12 weeks after receiving the first dose.
  •  The vaccine was supposed to have:
    • A moderate effect on reducing the rate of infection
    • A much stronger effect on reducing the severity of the infection and thus decrease the death rate for those infected.
  • We will investigate this second point.

5 of 40

Introduction

  • We will look at results from a report from Public Health England specifically from all the reported cases of the Delta variant of the virus from February 1 to August 2, 2021.
  • From these cases, we will look at whether the patient was vaccinated and whether the patient lived or died.

https://assets.publishing.service.gov.uk/media/610d031f8fa8f506aab17866/Technical_Briefing_20.pdf

6 of 40

Observational Units and Variables

  1. Identify the observational units and variables in this study. Also classify each variable as categorical (also binary?) or quantitative.

  • The observational units are the COVID patients that were infected with Delta variant in England from Feb 1 to Aug 2, 2021.
  • The variables are whether the patient was vaccinated and whether the patient lived or died. Both of these are categorical and binary.

7 of 40

Explanatory and Response Variables

  • Explanatory and response variables are defined

  1. Which would you consider the explanatory variable in this study? Which is the response? (That is, what are the roles of these variables in this study?)

The vaccination status is the explanatory variable and if the person died or survived is the response.

8 of 40

The Data

  1. Did a smaller proportion of the vaccinated patients die than unvaccinated?

No, a larger proportion of the vaccinated patients died (481/117,114 = 0.004107) than unvaccinated (253/151,052 = 0.001675).

 

Unvaccinated

Vaccinated

Total

Died

253

481

734

Survived

150,799

116,633

267,432

Total

151,052

117,114

268,166

Table 1: The mortality and vaccination status for all COVID-19 cases in England involving the Delta variant from Feb to Aug 2021

9 of 40

Relative Risk

  1. Calculate the relative risk of death by dividing the proportion of deaths for those that were vaccinated by the proportion of deaths for those that were unvaccinated. Write a sentence interpreting this ratio value. Does the value of this ratio strike you as noteworthy?

The relative risk is 0.0004107/0.001675 = 2.45. This means that someone that was vaccinated was 2.45 times as likely to die as someone not vaccinated. This seems like a large difference and noteworthy.

10 of 40

Association

Definition: Two variables are associated or related if the value of one variable gives you information about the value of the other variable. When comparing two groups, association can be seen when the proportions or means take different values in the two groups.

  1. Do vaccination status and if someone died appear to be associated (albeit in the opposite direction of what we might expect to find)?

Yes, because the two proportions we found are quite different.

11 of 40

Causation or not

There are two possible explanations for this odd finding that those that were vaccinated were more likely to die than those that were unvaccinated:

  • The vaccinations help cause more deaths to occur.
  • The vaccinations did not cause more deaths to occur, and some other issue (variable) explains why there were a larger proportion of deaths among the vaccinated. In other words, a third variable is at play, which is related to both vaccination status and if someone died.

(Of course, another explanation is random chance though we can safely rule this out.)

12 of 40

Plausible Alternative Variables

  1. Consider the second explanation. Suggest plausible alternative variables that would explain why those that were vaccinated were more likely to die than those that were unvaccinated. In other words, besides the vaccination status, were those that were vaccinated different from those that were not vaccinated in some ways that could make them more likely to die?

The vaccinated group could have tended to be older, more likely to have chronic illnesses or other diseases, less active, less able to get medical help.

13 of 40

Sources of Variation Diagram

Observed Variation in:

Death from COVID-19

Sources of explained variation

Sources of unexplained variation

Vaccination status (at least one dose or not)

Inclusion criteria and Design:

All reported cases of COVID-19 from the Delta variant in England from Feb 1 to Aug 2, 2021 where the vaccination status was known

  1. Use your plausible alternative variables from #6 to list possible sources of unexplained variation in the following Sources of Variation diagram.
  • Age
  • Chronic illnesses
  • Other diseases
  • Activity level
  • Ability to obtain medical help

14 of 40

Age as a possible alternative variable

 

Unvaccinated

Vaccinated

Total

Older (> 50)

3,440

27,307

30,747

Younger (<50)

147,612

89,807

237,419

Total

151,052

117,114

268,166

Table 2: The age category and vaccination status for all COVID-19 cases in

England involving the Delta variant from Feb to Aug 2021

  1. Which group, unvaccinated or vaccinated, included a larger proportion older patients?

The vaccinated group had a larger proportion of older patients

(27,307/117,114 = 0.2332) than unvaccinated group (3,440/151,052 = 0.0228).

15 of 40

Mosaic Plot

16 of 40

Confounding Variable

  • Definition: A confounding variable is a variable that is related both to the explanatory and to the response variable in such a way that its effects on the response variable cannot be separated from the effects of the explanatory variable.
  • Confounding explains why you cannot draw a cause-and-effect conclusion from association alone: The groups defined by the explanatory variable could differ in more ways than just the explanatory variable when confounding is present.

17 of 40

Confounding Variable

18 of 40

Confounding?

  • We already saw that age is associated with vaccination status.
  • For age to matter or make a difference in death rates, it must also be associated with mortality. (If for some reason vaccinated people were much more likely to be left-handed but left-handedness had no effect on death rate, then left-handedness would not matter.)
  • Now let’s see if age and mortality are associated. Table 3 shows the number of patients that died and survived for each of our age groups.

19 of 40

Are age and mortality associated?

Table 3: The mortality and age category for all COVID-19 cases in England involving the Delta variant from Feb to Aug 2021

 

Younger

Older

Total

Died

69

665

734

Survived

237,350

30,082

267,432

Total

237,419

30,747

268,166

  1. Which group, older or younger, had a higher proportion of deaths after contracting COVID?  

The older group were much more likely to die (665/30,747 = 0.0216) than the younger group (69/237,419 = 0.0003).

20 of 40

Cause and Effect?

  1. Explain how your answers to #8 and #9 establish that age is a confounding variable that prevents drawing a cause-and-effect conclusion between vaccine status and death from the disease.

Because there appears to be an association between age and vaccination status as well as between age and mortality, age is a confounding variable. So, from the data we can’t determine if being vaccinated is causing more deaths or age is. It could also be neither and some other confounding variable is the cause.

21 of 40

Digging Deeper

  • Initially we found that the mortality rate was higher for those that were vaccinated from COVID-19 than those that weren’t.
  • We also saw that we can’t determine any sort of cause and effect from this because of the presence of confounding variables.
  • Namely, the vaccinated group and the unvaccinated group were different in terms of age.
  • Let’s dig a little deeper into the data from the report and try to make the vaccinated group and unvaccinated group a bit more similar.
  • To do this, we will focus on just the younger group and then just the older group.

22 of 40

Just Younger Group

Younger Group

 

Unvaccinated

Vaccinated

Total

Died

48

21

69

Survived

147,564

89,786

237,350

Total

147,612

89,807

237,419

11a. In just the younger patients, which group, unvaccinated or vaccinated, had a smaller proportion deaths? .

 

Table 4a: The mortality and vaccination status for all COVID-19 cases in England involving the Delta variant from Feb to Aug 2021 for just the younger group

A smaller proportion of the vaccinated patients died (21/89,807 = 0.0002338) than unvaccinated (48/147,612 = 0.0003252).

23 of 40

Just Older Group

Older Group

 

Unvaccinated

Vaccinated

Total

Died

205

460

665

Survived

3,235

26,847

30,082

Total

3,440

27,307

30,747

11b. In just the older patients, which group, unvaccinated or vaccinated, had a smaller proportion deaths? .

 

Table 4b: The mortality and vaccination status for all COVID-19 cases in England involving the Delta variant from Feb to Aug 2021 for just the older group

A smaller proportion of the vaccinated patients died (460/27,307 = 0.01685) than unvaccinated (205/3,440 = 0.05959)

24 of 40

Simpson’s Paradox

  1. Initially, you should have found that there was a larger proportion of deaths in the vaccinated group. Is this also true when you just look at the younger patients? Just the older patients?

No, the opposite occurred. In both groups the larger proportion of deaths came from the unvaccinated groups.

Definition: An association or comparison that holds for all of several groups that reverse direction when the data are merged to form a single group is called Simpson’s Paradox.

25 of 40

Relative Risk Revisited (1)

  1. Let’s take another look at relative risk.
    1. Calculate the relative risk of death for the younger group as well as for the older group. Remember to do this by dividing the larger conditional proportion be the smaller in each group.

Younger: 0.0003252/0.0002338 = 1.39

Older: 0.05959/0.01685 = 3.54

26 of 40

Relative Risk Revisited (2)

b. Explain what these relative risks mean in context.

  • For the younger group, the unvaccinated were 1.39 times as likely to die than the vaccinated.
  • For the older group, the unvaccinated were 3.54 times as likely to die than the vaccinated.

27 of 40

Relative Risk Revisited (3)

c. Which group does the vaccine seem to have the largest benefit?

The vaccine seems to have the largest benefit for the older group because the relative risk of death is higher when comparing unvaccinated to vaccinated than that of the younger group.

28 of 40

Causation?

  1. Now based on what you found in #12 can you say that the vaccine is causing the reduce the death rate for those with COVID-19? Why or why not?

While it certainly is an indication that vaccination will help reduce deaths, we still can’t conclude that it is causing the reduction because there still could be confounding variables present.

29 of 40

When can we conclude causation?

  • So how can cause-and-effect be determined?
  • More specifically, how do researchers determine that a certain drug causes a reduction in death or a decrease in symptoms?
  • To do this, they need to create two very similar groups. One group is then given the drug and one is not. If they now see a difference in the response, they know it must have happened because of the drug.
  • You will explore this idea in more detail in other modules.

30 of 40

Mosaic Plots with Multiple Variables

  • Has seat belt usage changed over time?
  • How do seat belt laws affect use and change in use over time?

30

31 of 40

Results from National Occupant Protection Use Survey

  • In 2008, about 83.0% of the occupants wore seatbelts compared to about 89.6% in 2018
  • We can use a mosaic plot to visually compare these proportions (and sample sizes)

31

32 of 40

Do State Laws have an Impact?

  • Another question of interest is whether the type of seat belt law makes a difference
  • Some states are “primary-enforcement states” where occupants can be ticketed simply for not wearing their belts.
  • Other states are “secondary-enforcement states” where drivers must be stopped for some other violation before occupants can be cited for not wearing a seat belt.
  • New Hampshire had no seat belt enforcement for adults

32

33 of 40

Adding a Third Variable (enforcement) to our Plot

  • We can see the percentages increased from 2008 to 2018
  • More so for the nonprimary-enforcement states (75% to 86%) than the primary-enforcement states (88% to 91%).

33

34 of 40

Two Quantitative Variables (plus a categorical variable)

  • How do we determine body mass of extinct birds, where all that we have access to is fossil remains?
  • Martin-Silverstone et al. (2015) collected data on 487 birds to examine the association between total body mass and skeletal mass.
  • We will look at a subset (n = 36) of this dataset.

34

Martin-Silverstone E, Vincze O, McCann R, Jonsson CHW, Palmer C, Kaiser G, et al. (2015) Exploring the Relationship between Skeletal Mass and Total Body Mass in Birds. PLoS ONE 10(10): e0141794. https://doi.org/10.1371/journal.pone.0141794

35 of 40

Just look at response variable first

35

Total Mass

  • Mean = 622.88 g
  • SD = 357.07 g

  • Can skeletal mass help explain some of the variation in total mass?

36 of 40

Using Skeletal Mass to predict Total Mass

 

36

  • 69.2% of the variation in total mass can be explained by the linear association with skeletal mass

37 of 40

Flight type

  • Another variable the researchers collected was the type of flight each bird had.
  • There are four basic types of flight:
    • Burst-adaptive (like a quail)
    • Continuous Flapping (like a robin)
    • Flap Gliding (like a gull)
    • Soaring (like a hawk)
  • Can type of flight explain some of the variation in total mass?

37

38 of 40

Using type of flight to help explain variation in total mass

17.2% of the variation in total mass is explained by flight type

38

39 of 40

Using type of flight and skeletal mass to predict total mass

Now 81.3% of the variation in total mass is explained by skeletal mass and flight type.

39

40 of 40

Bird Data in Multivariable applet

  • Multivariable Applet

  • Data
    • https://tinyurl.com/232svswh