ECT Lesson Plan: Correlation vs Causation


Lesson plan at a glance...

Core subject(s)

Mathematics

Subject area(s)

Statistics and Probability

Suggested age

11 to 18 years old

Prerequisites

Standard deviation and summation (if calculating correlation coefficient)

Time

Preparation: 8 to 17 minutes

Instruction: 120 to 210 minutes

Standards

Core Subject: CCSS MATH

CS: CSTA, Australia

In this lesson plan…

Lesson Overview

While some patterns may imply causality, they may in fact be unrelated. In this lesson, students will test the strength of a correlation and discern whether or not a law or conclusion can be made based on that correlation. Students will see the threshold commonly accepted for correlating data and test their own assumptions about causation. This lesson will cover the following CT concepts: pattern recognition, pattern generalization, data analysis, and data collection.

Materials and Equipment

  • For the teacher:
  • Required: Presentation set-up
  • Internet-connected computer
  • Projector and projection screen or other flat projection surface
  • External speakers for audio playback
  • For the student:
  • Required: Internet-connected computers (one (1) computer per student recommended)
  • Required: Software

Preparation Tasks

Confirm that your computer is on and logged-in

1 to 3 minutes

Confirm that your projector is turned on and is projecting properly

1 to 4 minutes

Confirm that all students’ computers are turned on and logged-in

3 to 5 minutes

Install or navigate to GeoGebra (http://www.geogebra.org/)

3 to 5 minutes

The Lesson

Warm-up Activity: Correlation does not imply causation

20 minutes

Activity 1: Analyzing the correlation coefficient

20 minutes

Activity 2: Exploring correlation with data

60 to 120 minutes

Wrap-up Activity: Assessment

20 to 30 minutes

Warm-up Activity: Correlation does not imply causation (20 minutes)

Activity Overview: In this activity, students will see if correlation and causation are connected. They will identify situations in which correlation is mistakenly called causation or when correlation implies causation. Students use pattern recognition to find situations in which the data points towards correlation and where the data actually points towards causation. They will use data analysis to distinguish between the two in each situation.

Notes to the Teacher:

While correlation does not imply causation, there is a possibility of a connection between the two. It is possible that there is an unknown factor that gives the appearance of causality. One example of this is the placebo effect (http://wikipedia.org/wiki/Placebo) where patients who believe they will get better from taking the medication do in fact get better. This makes it difficult to test how effective the drug actually is.

Activity:

Walk through some examples of where correlation suggests causation (though there may or may not be any):

  1. If a student texts more than 20 messages a week, they will have lower test scores.
  2. People who wear seat belts will survive a car crash.
  3. If a person smokes, they will develop lung cancer.
  4. If a student has a larger shoe size, their reading scores will be higher.
  5. If you are hungry, you will spend more money.

Ask student the following questions.

Q1: Share an example you have heard or make one up where correlation is implying causation.

Q2: In how many of the shared examples are there an actual cause-effect?

Q3: Flip the examples around. Do the scenarios become more or less probable?

Q4: For each of the examples, write a situation (counterexample) that would prove that correlation does not always imply causation. 

Assessment:

A1: Answers will vary.

A2: Answers will vary.

A3:

  1. Students who have lower test scores text more than 20 messages a week. (less probable)
  2. If you survive a car crash, you were wearing a seatbelt. (no change)
  3. If you have lung cancer, it is because you smoke. (less)
  4. If a student has higher reading scores, their shoes will be larger. (no change)
  5. If you spent a lot of money, it was because you were hungry. (less)

A4: Answers will vary.


Activity 1: Analyzing the correlation coefficient (20 minutes)

Activity Overview: In this activity, students will calculate correlation coefficients for various data and explore the effect varying the data points has on the correlation coefficient.  Students use visual pattern recognition to understand how strongly data correlates and data analysis to answer questions about the implications of the data.

Notes to the Teacher:

If you would like to calculate the correlation coefficient with your students, see here: http://statistics.about.com/od/Descriptive-Statistics/a/How-To-Calculate-The-Correlation-Coefficient.htm. In fact the calculation of r is one that is a perfect example of how computers can take care of a tedious calculation and free up that time to focus on analyzing the data, which is what we truly want to do.

This table is subjective and depends on the nature of the data and the method used to collect it. It is reproduced here to give students an opportunity to see how r visually conveys the strength of correlation.

There is a cultural association of negative with bad, and students may confuse negative correlation, which means that when x increases y decreases, with weak correlation, which means that the “tightness” of the data is not there. Weak correlation can be either positive or negative.

Activity:

Read the following aloud to students and have them work through the activity.

The physical laws and understanding we have of our universe come as a result of analyzing data. Correlation allows us to see how strongly two variables are connected to each other. As mentioned above, this does not necessarily mean that one causes the other.

The actual calculation for Correlation (r) is a bit tedious and requires an understanding of other statistical concepts, such as Standard Deviation and summation. In this activity, we will use GeoGebra and the slope-intercept equation for a line to aid us in determining the strength of the relationship between two variables.  

  1. Share the Geogebra Correlation file with your students.
  2. Tell students that we can calculate the correlation between variables to determine how closely related they are.
  3. Have students look at the Geogebra file, there is a spread of points with the Correlation Coefficient (r) above it.
  4. Have students move the points in order to increase or decrease the value of r to replicate the situations below.
  5. r has a range from -1.0 to 1.0 depending on the strength of the correlation.
  1. If r = -1 the data is linear (can be described by y=mx+b) and y decreases as x increases.
  2. If r = 1 the data is linear and y increases as x increases.
  3. If r = 0 the data does not correlate at all linearly.
  4. For all other values of r, it describes how close the data is; that is, the strength of the correlation (see chart below).

Correlation (r)

Negative Slope

Positive Slope

None

−0.09 to 0.0

  0.0 to 0.09

Weak

−0.3 to −0.1

0.1 to 0.3

Medium

−0.5 to −0.3

0.3 to 0.5

Strong

−1.0 to −0.5

0.5 to 1.0

Source: Wikipedia (http://wikipedia.org/wiki/Pearson_product-moment_correlation_coefficient)

Q1: Create different values for r by moving the points on the graph.

  1. Strong Positive        c.   Medium Positive         e.    Small Positive           g. None
  2. Strong Negative       d.   Medium Negative        f.     Small Negative

Q2: Visually, what determines how close r is to -1 or 1?

Q3: The image below is referred to as Anscombe’s quartet (http://wikipedia.org/wiki/Anscombe%27s_quartet). The images below have the same value for r, and yet the data clearly is saying something different in each graph. How is it possible that these images have the same r?

Q4: What is a critical step to avoid data sending conflicting messages?

Assessment:

A1: Answers may vary (examples below):

Strong Positive

Strong Negative

Medium Positive

Medium Negative

Small Positive

Small Negative

None

A2: The direction of the data (negative or positive) and how close the data is to one another.


A3: Balanced data, limitations of using mean for statistics.


A4: Humans’ ability to see patterns beyond the calculation. A human would never say these 4 graphs are the same.


Activity 2: Exploring correlation with data (60 to 120 minutes)

Activity Overview: In this activity, students will explore examples of related variables by choosing and conducting experiments in order to predict and calculate the correlation between the variables. Students use data collection to record data from their chosen experiments, pattern generalization to predict a correlation coefficient and data analysis to calculate a correlation coefficient.

Activity:

Have students work through the following activity.

Q1: Before you begin to collect data, predict how strong/weak positive/negative you believe the correlation between your two variables will be.

  1. Have students conduct experiments they would like to conduct to determine whether or not there is a relationship and what the strength of the correlation of those factors are. 
  1. Students should choose experiments where the outcome is not already known.

Example:

  • Will the number of students who are absent vary according to the temperature?
  • Does the color of one’s car correlate to their income?
  • Will music help students study and if so what kind?
  1. Have students use data collection tools like Google Forms (http://www.google.com/google-d-s/forms/), and/or scientific sensors to be able to gather the data and calculate the strength of correlation to see if there is a relationship between the two variables.
  2. Google Spreadsheets
  1. Highlight the data and click Insert → Chart, click on the Chart tab and select Scatter from the type of plot.
  2. Calculate r - In a blank cell type =CORREL(data_1, data_2) where data_1/data_2 can be sets of data or entire columns (e.g. B1:B5, D:D “everything in column D”).

  1. Geogebra
  1. Open the spreadsheet by clicking View → Spreadsheet View.
  2. Add your data to the table.
  3. Highlight the data, right-click on it and select Create List of Points.
  4. In the command line at the bottom type in CorrelationCoefficient[data] where data is the name of your list of points.

Q2: What type of data (continuous or discrete) can be visualized in a scatterplot?

Q3: Compare your results with your assumptions.

  1. If your result is different from your prediction - reflect on why that might be (including possible experimental errors).
  2. Predicting results before conducting your research may seem like introducing bias into the experiment, but it can be done properly and can help test the Null Hypothesis (http://wikipedia.org/wiki/Null_hypothesis)

Assessment:

A1: Answers will vary.

A2: Discrete

A3: Answers will vary.


Wrap-up Activity: Assessment (20 to 30 minutes)

Activity: In this activity, students will analyze an article’s attempt to explain causation through correlation or they will practice predicting how closely certain data correlates. They will demonstrate pattern recognition and data analysis skills practiced in this lesson.

Notes to the Teacher:

Students can be assessed on their ability to predict how closely data correlates (see Correlation Assessment example).

Activity:

Have students analyze an article from the additional resources below and figure out if and where there are attempts to imply causation through correlation. Have them find counterexamples that disprove the implication.

Learning Objectives and Standards

Learning Objectives

Standards

LO1: Students will compute (using technology) the correlation coefficient of a linear fit.

Common Core

CCSS MATH..CONTENT.HSS.ID.C.9: Distinguish between correlation and causation.

Computer Science

AUSTRALIA 10.4 (Collecting, managing and analyzing data): Analyse and visualise data to create information and address complex problems; and model processes, entities and their relationships using structured data.

 

CSTA L3B.CT.9: Analyze data and identify patterns through modeling and simulation.

LO2: Students will identify the slope of the linear graph as the constant in the relationship y=kx and apply this principle to interpreting graphs constructed from data.

Computer Science

AUSTRALIA 10.4 (Collecting, managing and analyzing data)

 

CSTA L3B.CT.9

LO3: Students will collect data and determine how strong the correlation is between the variables.

Computer Science

CSTA L3B.CT.5: Use data analysis to enhance understanding of complex natural and human systems.

Additional Information and Resources

Lesson Vocabulary

Term

Definition

For Additional Information

Correlation

Data that shows a mathematical relationship between two variables (e.g. as the temperature of a system increases, the pressure tends to increase as well).

http://en.wikipedia.org/wiki/Correlation_and_dependence

Causation

The implication that B follows A. (E.g. Taking this medicine will make me feel better).

http://en.wikipedia.org/wiki/Correlation_does_not_imply_causation

Computational Thinking Concepts

Concept

Definition

Pattern Recognition

Observing patterns and regularities in data

Pattern Generalization

Creating models of observed patterns to test predicted outcomes

Data Analysis

Making sense of data by finding patterns or developing insights

Data Collection

Gathering information

Additional Resource Links

Extension Activities for Student Enrichment

  • The student’s experiments and research could be entered into the Google Science Fair (http://www.google.com/events/sciencefair/) where they can continue to explore the topic and present their findings to the world.

Administrative Details

Contact info

For more info about Exploring Computational Thinking (ECT), visit the ECT website (g.co/exploringCT)

Credits

Developed by the Exploring Computational Thinking team at Google and reviewed by K-12 educators from around the world.

Last updated on

07/02/2015

Copyright info

Except as otherwise noted, the content of this document is licensed under the Creative Commons Attribution 4.0 International License, and code samples are licensed under the Apache 2.0 License.


 ECT: Correlation vs Causation                                                                                              of