ECT Lesson Plan: Surveys and Estimating Large Quantities


Lesson plan at a glance...

Core subject(s)

Mathematics

Subject area(s)

Statistics and Probability

Suggested age

11 to 18 years old

Prerequisites

Find the average of a set of numbers

Time

Preparation: 9 to 23 minutes

Instruction: 100 to 125 minutes

Standards

Core Subject: CCSS Math

CS: CSTA, UK, Australia

In this lesson plan…

Lesson Overview

In some situations, because there is so much data it can be difficult to get an exact count. We can use estimation to approximate the size of our data and know how much we can rely upon that data. By observing smaller sets and seeing patterns we can make general predictions and even create algorithms capable of making approximations. This lesson will cover the following CT concepts: decomposition, algorithms, and abstraction.

Materials and Equipment

  • For the teacher:
  • Required: Printed copies of the grid activity
  • Required: Software Development Environment and Internet-connected computer
  • For the student:
  • Required: Internet-connected computers (one (1) computer per student recommended)
  • Required: Software Development Environment
  • Required: Ruler

Preparation Tasks

Confirm that your computer is on and logged-in

1 to 3 minutes

Confirm that all students’ computers are turned on and logged-in

3 to 5 minutes

Download Python 2.x (https://www.python.org/downloads/) or navigate to Trinket (https://trinket.io/)

5 to 15 minutes


The Lesson

Warm-up Activity: Fermi problems and estimation

20 to 30 minutes

Activity 1: Estimating crowds with grids

20 minutes

Activity 2: Census, surveys, and sample error/bias

30 minutes

Activity 3: Estimating the number of pages on the web

30 to 45 minutes

Wrap-up Activity: Assessment

10 to 20 minutes

Warm-up Activity: Fermi problems and estimation (20 to 30 minutes)

Activity Overview: In this activity, students will be introduced to the idea of a Fermi problem. They will use decomposition to break down problems and estimate an answer.  

Notes to the Teacher:

A Fermi problem (http://wikipedia.org/wiki/Fermi_problem) involves estimating a quantity that is unknown or difficult to know exactly. The purpose of a Fermi problem is not to get a “correct” or exact answer, but to develop one’s ability to reason and estimate with large quantities.

Students will use what they already know about the universe and apply it to give a reasonable answer. Multiple answers are possible and creativity is encouraged. It is recommended that students do not use calculators unless absolutely necessary, as this will challenge them to make general estimates and practice working with orders of magnitude.

Fermi problems can be found in interviews for jobs involving computational thinking: Microsoft (http://books.google.com/ebooks?id=hojAaIwFn9YC&dq=How+Would+You+Move+Mount+Fuji)

Activity:

Provide students with any of the following examples of a Fermi problem (or search online for Fermi problems and Google/Microsoft interview questions) to answer:

  1. How many types of food are there in the supermarket?
  2. How many jellybeans are in a jar?
  3. How many pennies would you place end-to-end from one end of your school to the other?
  4. How many websites are there online?
  5. How many fish are in the ocean?
  6. How many basketballs could you fit inside your classroom?
  7. How many people are using the Internet right now?
  8. How many stars are in the sky?

Q1: There is no formula for solving these problems. How did you start thinking about ways to come up with a response?

Q2: How do you determine if the answer will be measured in hundreds, millions, billions, or more?

Q3: Write your own Fermi question involving something you are interested in.

Q4: Why might companies use these types of questions in interviews in addition to straightforward questions with an exact answer?


Activity 1: Estimating crowds with grids (20 minutes)

Activity Overview: In this activity, students will decompose problems by using a grid to help estimate numbers. Students are able to draw abstractions from one situation and apply it to another.

Notes to the Teacher:

The dot diagram in the activity below was made with Python code and the grid lines were added in the vector graphics program Inkscape (http://inkscape.org). Use this code to make your own samples for student estimation:

from turtle import *

from random import *

hideturtle() #Hides the turtle

penup() #Lifts the pen for drawing dots not lines

used_points = [] #Holds already used points to avoid overlap

x = 0

y = 0

 

num_points = input("How many points:")

for point in range(0, num_points):

   while (x, y) in used_points: #Loops until a unique point is found

       x = randint(0, 200) #Change these for larger/smaller area

       y = randint(0, 200)

       used_points.append((x, y)) #Add point to usedPoints

       setposition(x, y)  #Moves turtle to new random position.

       dot() #Makes a new dot

Activity:

In 1963 Dr. Martin Luther King Jr. delivered his “I have a dream speech” at the National Mall in Washington D.C. (http://maps.google.com/maps?q=washington+monument). Here is an image of some of the large crowds in attendance. Explore with your students how you would go about estimating the number of people attending this event.

Another example is attendance for an event. At each game, they challenge the audience to guess how many people are at the concert/game. Discuss with your students what to do to calculate the number in attendance when you know the maximum capacity for the venue, but it isn’t full.


One way crowds are estimated, especially for events like Dr. King’s speech where no tickets were sold, is to divide the groups up into a grid. Count the number within one box and multiply it by the number of squares in your grid. Review this concept by showing how you would estimate the number using the images below

Ask your students to answer the following questions:

Q1: What do you expect the challenges and possibilities for error to be with this method?

Q2: Is this data continuous or discrete?

Q3: What would make this method more accurate?

Q4: In 1995, the Million Man March (http://wikipedia.org/wiki/Million_Man_March) organizers estimated attendance to be around 800,000, the National Park Police estimated attendance at 400,000.

  1. How is it possible that their estimates could be so far apart?
  2. It was recommended that in the future, private companies be used to estimate large crowds for these events as the Park Police lack the funds to carry out the analysis. You have been hired to estimate how many people are expected to come to the next demonstration at the National Mall. What would you do?

Q5: If you were to use this method to determine how many fish are in a lake, what might be some of the challenges?

Assessment:

A1: One example is that some boxes may have lots of people while others may not.

A2: Discrete because there is an exact number of people.

A3: Larger boxes. Finding the right balance between lots of counting and an accurate sample.

A4a: One group may have used different sized sampling boxes from the other and over/underestimated the number. Politics may have driven the estimate as the organizers would choose the upper bound of the estimate and the police the other.

A4b: One resource may be to use Google Earth/Maps (http://google.com/maps) to determine the area of the space.

A5: Fish move, birth/death rate much higher than humans.


Activity 2: Census, surveys, and sample error/bias (30 minutes)

Activity Overview: In this activity, students will calculate mean values from sample data, analyze and evaluate the data, and determine error and bias. Students recognize patterns in data upon which they can base conclusions.

Notes to the Teacher:

Students look for patterns within large amounts of data to see the potential problems that come from surveys. In addition they see how sampling bias has a real effect on society and its citizens.

For time’s sake, the calculations are already finished and students can examine the data right away. If you are going to follow the directions below, delete the cells highlighted in green.

Activity:

Have students open Survey Sample Data and save their own copy. They will go through the following steps.

  1. The Sample Survey Data is a multi-sheet spreadsheet. The first sheet shows the results of different sample sizes. The next three sheets contain data from three different surveys.
  2. Each survey was given to 100 people of each race (A, B, C), and they were asked questions about their income, education, age, etc.
  3. To calculate the mean age and income for each set of data use the name of the survey and the sample size. This will tell you how much of the data to sample from.

  1. For example, Survey 1 is using a sample size of 20% (or 20 people) in Cell C5 enter =AVERAGE('Survey 1'!C2:C21)

  1. ‘Survey 1’! is the syntax used to reference another sheet.
  2. C2:C21 is the first 20 people surveyed (20%).
  1. In Cell D5 enter =AVERAGE(‘Survey1’!D2:D21) to calculate Survey 1’s mean income.
  1. Continue to fill in the other green squares and answer the questions below to discover how surveys can be prone to bias and error.

Q1: Why is the mean age for Survey 1 so much less than for the other surveys?

Q2: What do you notice about Survey 2’s data? Why is this a problem with surveys?

Q3: Survey 3 has approximately the same result for a sample size of 75% and 100%. Could the sample size be smaller? Why might it be useful to make the sample size as small as possible while still being accurate?

Q4: To estimate the results of elections, or allocate government resources, people are surveyed to try and gather this information. Why won’t this give the exact results or information?

Q5: The US conducts a Census (http://www.census.gov/) every 10 years to find out about our current population, demographics, businesses, education, etc. Assuming not everyone can be surveyed, what are some other ways to find out how many people live in the US?

Assessment:

A1: One possible reason is that the sample size was only 20% and those sampled could have by chance been younger than the overall population.

A2: Although the mean age and income are similar to Survey 1, only 50% of Race A turned in their data which means that Race A’s needs and situation would not be represented well in the survey.

A3: Smaller sample size means less resources needed to collect the data but it also increases the likelihood of sample error or bias.

A4: Not everyone is able to be surveyed or will answer truthfully.

A5: Public records of births/ deaths, home purchases, employment rates. These provide indirect data that can be used to give a better overall picture and reduce sampling error. More on US Census Methodology (http://www.census.gov/popest/topics/methodology/2009-nat-meth.pdf).

Activity:

Have students answer the following questions:

Q6: Imagine you are going to bring everyone in your class a cupcake on your birthday. One day, you give everyone in class a card that says, “Would you like, chocolate or vanilla?” On your birthday, you ran out of chocolate cupcakes. What might have happened to cause this?

Q7: Not everyone is able to roll their tongue. If you asked your class how many of them could roll their tongue and 60% of them could, does this mean that only 60% of your school, the nation, the World can roll their tongue? How would you begin to estimate this?

Q8: When you flip a coin or roll a dice, you assume that the flip or roll is fair. One of the assumptions in a survey is that everyone will receive and send back the survey. What is a problem with this logic?

Q9: The Census Bureau uses many tools to try and encourage and discover bias in the census survey. As of 2010, the national return rate for the survey was 74%. Looking at this map (http://2010.census.gov/2010census/take10map/), was this evenly distributed? How does your state compare? Why might some states or areas have lower survey return rates than others?

Q10: How would this bias the survey?

Assessment:

A6: It’s possible that students were absent when you gave the survey or surveys were not returned.

A7: You would need to gather a larger sample size until you were satisfied that the data was as close to ideal as possible. While surveyors want to have ideal results, money & resources limit what is possible.

A8: Not everyone receives mail, not everyone remembers or decides to send the survey back, not everyone responds truthfully.

A9: Answers will vary.

A10: People who return the survey will be represented and receive resources, those who do not, are less likely to and might be the most in need of government support.


Activity 3: Estimating the number of pages on the web (30 to 45 minutes)

Activity Overview: In this activity, students will use decomposition and patterns to estimate the number of pages on the web. Students use an algorithm to calculate the number of links on a web page.

Notes to the Teacher:

Be sure to try this activity out on your own before doing it with students to make sure it will work with your computer and networking setup (e.g. proxy and firewall).

Activity:

This activity will give a close approximation to the number of links on a page. Crawling the web as it is known professionally, is an art and is difficult to do precisely. This activity has a lot in common with the previous activity as some websites will have many links while others have very few. Go through the following steps with your students and have them answer the questions.

The web is a huge space and many more links are created and discovered every second. We can only use estimation and develop better tools to improve our approximation. In 2008, Google approximated the web to have 1 trillion (1,000,000,000,000) unique websites (http://googleblog.blogspot.com/2008/07/we-knew-web-was-big.html). The web is growing at an incredible rate and that number is much larger today.

Q1: How does one get from one website to another?

Q2: How would you estimate the total number of web pages on the Internet?

Assessment:

A1: Links/Hyperlinks

A2: Answers will vary.

Activity:

  1. Share the Links on Websites spreadsheet with your students.
  2. Assign a number to each student.
  3. Have each student choose 5 different websites from everyone else.
  4. Have them do the following.

Enter the URLs for the websites into the spreadsheet.

  1. Open Python and press Command-N or Ctrl N.
  2. Paste the code below into Python and change the third line’s website to the website to crawl for links.
  3. Press F5 or from the menu Run → Run Module
  1. Enter the results for each page into the spreadsheet.
  2. Look at the Results sheet, for the aggregated results.
  3. Open Python and type Ctrl-N (PC) or Command-N (Mac) to have a blank page to enter in code.
  4. After entering the code, press F5 or from the menu Run → Run Module to run the code.

#Approximates the number of links on a webpage.

import urllib2

page = urllib2.urlopen("http://www.example.com") #Enter the site to test

contents = page.read().lower()   #Store the website's HTML source

count = 0                #Keep track of how many links there are

 

for char in range(0, len(contents)): #Loops through the page's code

   code_chunk = contents[char:char + 7] #7 character chunk of code to check

   if code_chunk == '<a href':

        count += 1

print count

Q3: Did the class collect enough links for a good sample size? How many would you want to search?

Q4: What are some problems with using the above method to count links?  

Q5: How might a computer help?

Q6: How did you select the links that you tested? Was there bias involved?

Assessment:

A3:  Answers will vary.

A4: Only approximates the number of links, possible duplicates.

A5: Automates the process (web crawlers), checks for duplicates.

A6: It’s possible, if the links were not random, that these websites might have had more links than average.


Wrap-up Activity: Assessment (10 to 20 minutes)

Activity Overview: In this activity, students will use one of a variety of assessments to summarize learnings.

Activity:

Choose one or two of the following activities for your students to complete.

  • Have students generate their own Fermi problems and exchange them with other students. If you are able to assemble a large number of challenges, it will help the students become better at analysis and estimation. These types of problems can be adapted to any content area.
  • Give students a picture of a crowd, large group of houses from Google Maps (http://google.com/maps), or even just another group of dots, and have them estimate the number. Limit their time to ensure that they are using the grid method or something similar instead of counting each one.
  • Have students write up ways in which the US Census could be improved to get a better count of minority populations.

Learning Objectives and Standards

Learning Objectives

Standards

LO1: Students will make inferences and justify conclusions from sample surveys, experiments, and observational studies.

Core Subject

CCSS MATH.HSS-IC: Understand statistics as a process for making inferences about population parameters based on a random sample from that population.

Computer Science

AUSTRALIA 6.7 (Creating digital solutions by: implementing): Implement digital solutions as simple visual programs involving branching, iteration (repetition), and user input.

CSTA L2.CT.8:Use visual representations of problem states, structures and data (e.g., graphs, charts, network diagrams, flowcharts).

LO2: Students will be able to decompose a problem related to estimating large numbers and make an informed estimate.

Core Subject

CCSS MATH.CONTENT.7.SP.A.2: Use data from a random sample to draw inferences about a population with an unknown characteristic of interest. Generate multiple samples (or simulated samples) of the same size to gauge the variation in estimates or predictions. For example, estimate the mean word length in a book by randomly sampling words from the book; predict the winner of a school election based on randomly sampled survey data. Gauge how far off the estimate or prediction might be.

CCSS MATH.CONTENT.HSS.IC.B.4: Use data from a sample survey to estimate a population mean or proportion; develop a margin of error through the use of simulation models for random sampling.

Computer Science

CSTA L3A.CT.4: Compare techniques for analyzing massive data collections.

UK 4.2: Develop and apply their analytic, problem-solving, design, and computational thinking skills.

CSTA L2.CT.12: Use abstraction to decompose a problem into subproblems.

LO3: Students will identify possible reasons for inconsistent results, such as sources of error or uncontrolled conditions.

Core Subject

CCSS MATH.CONTENT.HSS.IC.B.4: Use data from a sample survey to estimate a population mean or proportion; develop a margin of error through the use of simulation models for random sampling.

Computer Science

CSTA L2.CT.15: Provide examples of interdisciplinary applications of computational thinking.

LO4: Students will calculate mean values and analyze the results to look for possible error and bias.

Core Subject

CCSS MATH.CONTENT.HSS.IC.B.4: Use data from a sample survey to estimate a population mean or proportion; develop a margin of error through the use of simulation models for random sampling.

CCSS MATH.CONTENT.6.SP.B.5.C: Giving quantitative measures of center (median and/or mean) and variability (interquartile range and/or mean absolute deviation), as well as describing any overall pattern and any striking deviations from the overall pattern with reference to the context in which the data were gathered.

Computer Science

AUSTRALIA 6.7 (Creating digital solutions by: implementing)

CSTA L2.CT.8

LO5: Students will use decomposition and patterns to estimate the number of pages on the web.

Core Subject

CCSS MATH.CONTENT.7.SP.A.2: Use data from a random sample to draw inferences about a population with an unknown characteristic of interest. Generate multiple samples (or simulated samples) of the same size to gauge the variation in estimates or predictions. For example, estimate the mean word length in a book by randomly sampling words from the book; predict the winner of a school election based on randomly sampled survey data. Gauge how far off the estimate or prediction might be.

CCSS MATH.CONTENT.HSS.IC.B.4: Use data from a sample survey to estimate a population mean or proportion; develop a margin of error through the use of simulation models for random sampling.

CCSS MATH.PRACTICE.MP7: Look for and make use of structure.

Computer Science

CSTA L2.CT.12

UK 4.2

Additional Information and Resources

Lesson Vocabulary

Term

Definition

For Additional Information

Fermi problem

An estimation problem designed to teach dimensional analysis, approximation, and the importance of clearly identifying one's assumptions.

http://en.wikipedia.org/wiki/Fermi_problem

Estimation

Predicting a result using logic and data when it is difficult or impossible to determine the exact number.

http://en.wikipedia.org/wiki/Estimation

Web crawler

A Web crawler is an Internet bot which systematically browses the World Wide Web, typically for the purpose of Web indexing.

https://en.wikipedia.org/?title=Web_crawler

Computational Thinking Concepts

Concept

Definition

Algorithm

Creating an ordered series of instructions for solving similar problems

Abstraction

Reducing complexity to define main idea

Decomposition

Breaking down tasks into smaller, manageable parts

Pattern Recognition

Identifying trends and commonalities between data points, groups, or sets

Additional Resource Links

Administrative Details

Contact info

For more info about Exploring Computational Thinking (ECT), visit the ECT website (g.co/exploringCT)

Credits

Developed by the Exploring Computational Thinking team at Google and reviewed by K-12 educators from around the world.

Last updated on

07/02/2015

Copyright info

Except as otherwise noted, the content of this document is licensed under the Creative Commons Attribution 4.0 International License, and code samples are licensed under the Apache 2.0 License.


 ECT: Surveys and Estimating Large Quantities                                                                              of