2 of 8

Introduction

What can we say about the cause of the pattern in the data?

E.g.: Cities with more McDonald’s restaurants record larger numbers of divorces

Can we conclude that eating Big Macs® increases the likelihood of divorce.
If no, why do cities that sell more Big Macs have more divorces?

Step 5: Formulate conclusions tells us how broadly the conclusion applies.

The main goal of this chapter is to explain when and why you can infer cause.

3 of 8

Example - Smoking and Cancer

Let’s examine association between smoking and cancer.
Initially, scientists found that smokers had higher rates of lung cancer than did nonsmokers.

Smoking and cancer were associated.
Did that prove that smoking caused cancer?

No.
Some scientists thought there might be a presence of the gene (yes/no) could be a confounding variable or confounder.
To conclude that smoking causes lung cancer, you must be able to rule out the effect of possible confounders.

Is association evidence of possible causation?

Yes. If smoking causes cancer, there has to be an association.
Cancer rates will be higher for smokers. If the two are associated, one might cause the other.
Association is necessary, but association alone is not enough to prove cause and effect.

4 of 8

Explanatory and Response Variables

“Does smoking cause a person’s chance of being cancer to increase?”

Let’s have labels for the two different roles of the variables:

The explanatory variable: Smoking Status (2 groups)
The response variable, in this case, Developing Cancer (2 groups)

5 of 8

Confounding Variable

What we are most concerned about is these potential confounding variables that prevent us from isolating the explanatory variable as the only influence on the response variable.

6 of 8

Types of Experiments

Randomized experiments can potentially lead to cause-and-effect conclusions between the explanatory and response variables.
What about experiments which manipulate the explanatory variable, but not randomly?

These are often called quasi-experiments.
Suppose we wanted to compare student learning gains when using a new curriculum to that of an old curriculum. It would be difficult to assign students randomly to which class they take.
So the instructors might take the results from pre- and posttests from students who used the old curriculum one year.

Also keep in mind that some explanatory variables of interest don’t lend themselves to randomized experiments. For example, the sex of the participant can’t be randomly imposed on individuals, and other variables, such as smoking behavior, would be unethical to manipulate!

8 of 8

No Random Assignment, No Cause-Effect?

That’s much too black and white.
Yes, that’s the ideal way to learn about cause and effect. But you can’t always do that in practice.

Think about smoking and lung cancer. Q: Those were observational studies, not randomized experiments. How did scientists conclude that smoking causes cancer? A: Remember that a main purpose of randomization is to protect against confounding. It’s possible to design observational studies that also protect against confounding. It’s a lot harder, but it’s possible.

It was over many, many of these more complicated observational studies, with more advanced statistical methods, and with support from other types of studies linking carcinogens to cigarettes, that it became an accepted scientific truth that smoking causes lung cancer.

REMEMBER! As in all of statistics, the mathematical theory is an ideal, but reality is almost always more complicated. You don’t want to throw out the baby (the data) just because the bath water (design) is not ideal.