ECT Lesson Plan: Surveys and Estimating Large Quantities
Lesson plan at a glance...
| In this lesson plan… |
In some situations, because there is so much data it can be difficult to get an exact count. We can use estimation to approximate the size of our data and know how much we can rely upon that data. By observing smaller sets and seeing patterns we can make general predictions and even create algorithms capable of making approximations. This lesson will cover the following CT concepts: decomposition, algorithms, and abstraction.
Confirm that your computer is on and logged-in | 1 to 3 minutes | |
Confirm that all students’ computers are turned on and logged-in | 3 to 5 minutes | |
Download Python 2.x (https://www.python.org/downloads/) or navigate to Trinket (https://trinket.io/) | 5 to 15 minutes |
20 to 30 minutes | |
20 minutes | |
30 minutes | |
30 to 45 minutes | |
10 to 20 minutes |
Activity Overview: In this activity, students will be introduced to the idea of a Fermi problem. They will use decomposition to break down problems and estimate an answer.
Notes to the Teacher: A Fermi problem (http://wikipedia.org/wiki/Fermi_problem) involves estimating a quantity that is unknown or difficult to know exactly. The purpose of a Fermi problem is not to get a “correct” or exact answer, but to develop one’s ability to reason and estimate with large quantities. Students will use what they already know about the universe and apply it to give a reasonable answer. Multiple answers are possible and creativity is encouraged. It is recommended that students do not use calculators unless absolutely necessary, as this will challenge them to make general estimates and practice working with orders of magnitude. Fermi problems can be found in interviews for jobs involving computational thinking: Microsoft (http://books.google.com/ebooks?id=hojAaIwFn9YC&dq=How+Would+You+Move+Mount+Fuji) |
Activity: Provide students with any of the following examples of a Fermi problem (or search online for Fermi problems and Google/Microsoft interview questions) to answer:
Q1: There is no formula for solving these problems. How did you start thinking about ways to come up with a response? Q2: How do you determine if the answer will be measured in hundreds, millions, billions, or more? Q3: Write your own Fermi question involving something you are interested in. Q4: Why might companies use these types of questions in interviews in addition to straightforward questions with an exact answer? |
Activity Overview: In this activity, students will decompose problems by using a grid to help estimate numbers. Students are able to draw abstractions from one situation and apply it to another.
Notes to the Teacher: The dot diagram in the activity below was made with Python code and the grid lines were added in the vector graphics program Inkscape (http://inkscape.org). Use this code to make your own samples for student estimation:
|
Activity: In 1963 Dr. Martin Luther King Jr. delivered his “I have a dream speech” at the National Mall in Washington D.C. (http://maps.google.com/maps?q=washington+monument). Here is an image of some of the large crowds in attendance. Explore with your students how you would go about estimating the number of people attending this event. Another example is attendance for an event. At each game, they challenge the audience to guess how many people are at the concert/game. Discuss with your students what to do to calculate the number in attendance when you know the maximum capacity for the venue, but it isn’t full. One way crowds are estimated, especially for events like Dr. King’s speech where no tickets were sold, is to divide the groups up into a grid. Count the number within one box and multiply it by the number of squares in your grid. Review this concept by showing how you would estimate the number using the images below Ask your students to answer the following questions: Q1: What do you expect the challenges and possibilities for error to be with this method? Q2: Is this data continuous or discrete? Q3: What would make this method more accurate? Q4: In 1995, the Million Man March (http://wikipedia.org/wiki/Million_Man_March) organizers estimated attendance to be around 800,000, the National Park Police estimated attendance at 400,000.
Q5: If you were to use this method to determine how many fish are in a lake, what might be some of the challenges? |
Assessment: A1: One example is that some boxes may have lots of people while others may not. A2: Discrete because there is an exact number of people. A3: Larger boxes. Finding the right balance between lots of counting and an accurate sample. A4a: One group may have used different sized sampling boxes from the other and over/underestimated the number. Politics may have driven the estimate as the organizers would choose the upper bound of the estimate and the police the other. A4b: One resource may be to use Google Earth/Maps (http://google.com/maps) to determine the area of the space. A5: Fish move, birth/death rate much higher than humans. |
Activity Overview: In this activity, students will calculate mean values from sample data, analyze and evaluate the data, and determine error and bias. Students recognize patterns in data upon which they can base conclusions.
Notes to the Teacher: Students look for patterns within large amounts of data to see the potential problems that come from surveys. In addition they see how sampling bias has a real effect on society and its citizens. For time’s sake, the calculations are already finished and students can examine the data right away. If you are going to follow the directions below, delete the cells highlighted in green. |
Activity: Have students open Survey Sample Data and save their own copy. They will go through the following steps.
Q1: Why is the mean age for Survey 1 so much less than for the other surveys? Q2: What do you notice about Survey 2’s data? Why is this a problem with surveys? Q3: Survey 3 has approximately the same result for a sample size of 75% and 100%. Could the sample size be smaller? Why might it be useful to make the sample size as small as possible while still being accurate? Q4: To estimate the results of elections, or allocate government resources, people are surveyed to try and gather this information. Why won’t this give the exact results or information? Q5: The US conducts a Census (http://www.census.gov/) every 10 years to find out about our current population, demographics, businesses, education, etc. Assuming not everyone can be surveyed, what are some other ways to find out how many people live in the US? |
Assessment: A1: One possible reason is that the sample size was only 20% and those sampled could have by chance been younger than the overall population. A2: Although the mean age and income are similar to Survey 1, only 50% of Race A turned in their data which means that Race A’s needs and situation would not be represented well in the survey. A3: Smaller sample size means less resources needed to collect the data but it also increases the likelihood of sample error or bias. A4: Not everyone is able to be surveyed or will answer truthfully. A5: Public records of births/ deaths, home purchases, employment rates. These provide indirect data that can be used to give a better overall picture and reduce sampling error. More on US Census Methodology (http://www.census.gov/popest/topics/methodology/2009-nat-meth.pdf). |
Activity: Have students answer the following questions: Q6: Imagine you are going to bring everyone in your class a cupcake on your birthday. One day, you give everyone in class a card that says, “Would you like, chocolate or vanilla?” On your birthday, you ran out of chocolate cupcakes. What might have happened to cause this? Q7: Not everyone is able to roll their tongue. If you asked your class how many of them could roll their tongue and 60% of them could, does this mean that only 60% of your school, the nation, the World can roll their tongue? How would you begin to estimate this? Q8: When you flip a coin or roll a dice, you assume that the flip or roll is fair. One of the assumptions in a survey is that everyone will receive and send back the survey. What is a problem with this logic? Q9: The Census Bureau uses many tools to try and encourage and discover bias in the census survey. As of 2010, the national return rate for the survey was 74%. Looking at this map (http://2010.census.gov/2010census/take10map/), was this evenly distributed? How does your state compare? Why might some states or areas have lower survey return rates than others? Q10: How would this bias the survey? |
Assessment: A6: It’s possible that students were absent when you gave the survey or surveys were not returned. A7: You would need to gather a larger sample size until you were satisfied that the data was as close to ideal as possible. While surveyors want to have ideal results, money & resources limit what is possible. A8: Not everyone receives mail, not everyone remembers or decides to send the survey back, not everyone responds truthfully. A9: Answers will vary. A10: People who return the survey will be represented and receive resources, those who do not, are less likely to and might be the most in need of government support. |
Activity Overview: In this activity, students will use decomposition and patterns to estimate the number of pages on the web. Students use an algorithm to calculate the number of links on a web page.
Notes to the Teacher: Be sure to try this activity out on your own before doing it with students to make sure it will work with your computer and networking setup (e.g. proxy and firewall). |
Activity: This activity will give a close approximation to the number of links on a page. Crawling the web as it is known professionally, is an art and is difficult to do precisely. This activity has a lot in common with the previous activity as some websites will have many links while others have very few. Go through the following steps with your students and have them answer the questions. The web is a huge space and many more links are created and discovered every second. We can only use estimation and develop better tools to improve our approximation. In 2008, Google approximated the web to have 1 trillion (1,000,000,000,000) unique websites (http://googleblog.blogspot.com/2008/07/we-knew-web-was-big.html). The web is growing at an incredible rate and that number is much larger today. Q1: How does one get from one website to another? Q2: How would you estimate the total number of web pages on the Internet? |
Assessment: A1: Links/Hyperlinks A2: Answers will vary. |
Activity:
Enter the URLs for the websites into the spreadsheet.
Q3: Did the class collect enough links for a good sample size? How many would you want to search? Q4: What are some problems with using the above method to count links? Q5: How might a computer help? Q6: How did you select the links that you tested? Was there bias involved? |
Assessment: A3: Answers will vary. A4: Only approximates the number of links, possible duplicates. A5: Automates the process (web crawlers), checks for duplicates. A6: It’s possible, if the links were not random, that these websites might have had more links than average. |
Activity Overview: In this activity, students will use one of a variety of assessments to summarize learnings.
Activity: Choose one or two of the following activities for your students to complete.
|
Learning Objectives | Standards |
LO1: Students will make inferences and justify conclusions from sample surveys, experiments, and observational studies. | Core Subject CCSS MATH.HSS-IC: Understand statistics as a process for making inferences about population parameters based on a random sample from that population. Computer Science AUSTRALIA 6.7 (Creating digital solutions by: implementing): Implement digital solutions as simple visual programs involving branching, iteration (repetition), and user input. CSTA L2.CT.8:Use visual representations of problem states, structures and data (e.g., graphs, charts, network diagrams, flowcharts). |
LO2: Students will be able to decompose a problem related to estimating large numbers and make an informed estimate. | Core Subject CCSS MATH.CONTENT.7.SP.A.2: Use data from a random sample to draw inferences about a population with an unknown characteristic of interest. Generate multiple samples (or simulated samples) of the same size to gauge the variation in estimates or predictions. For example, estimate the mean word length in a book by randomly sampling words from the book; predict the winner of a school election based on randomly sampled survey data. Gauge how far off the estimate or prediction might be. CCSS MATH.CONTENT.HSS.IC.B.4: Use data from a sample survey to estimate a population mean or proportion; develop a margin of error through the use of simulation models for random sampling. Computer Science CSTA L3A.CT.4: Compare techniques for analyzing massive data collections. UK 4.2: Develop and apply their analytic, problem-solving, design, and computational thinking skills. CSTA L2.CT.12: Use abstraction to decompose a problem into subproblems. |
LO3: Students will identify possible reasons for inconsistent results, such as sources of error or uncontrolled conditions. | Core Subject CCSS MATH.CONTENT.HSS.IC.B.4: Use data from a sample survey to estimate a population mean or proportion; develop a margin of error through the use of simulation models for random sampling. Computer Science CSTA L2.CT.15: Provide examples of interdisciplinary applications of computational thinking. |
LO4: Students will calculate mean values and analyze the results to look for possible error and bias. | Core Subject CCSS MATH.CONTENT.HSS.IC.B.4: Use data from a sample survey to estimate a population mean or proportion; develop a margin of error through the use of simulation models for random sampling. CCSS MATH.CONTENT.6.SP.B.5.C: Giving quantitative measures of center (median and/or mean) and variability (interquartile range and/or mean absolute deviation), as well as describing any overall pattern and any striking deviations from the overall pattern with reference to the context in which the data were gathered. Computer Science AUSTRALIA 6.7 (Creating digital solutions by: implementing) |
LO5: Students will use decomposition and patterns to estimate the number of pages on the web. | Core Subject CCSS MATH.CONTENT.7.SP.A.2: Use data from a random sample to draw inferences about a population with an unknown characteristic of interest. Generate multiple samples (or simulated samples) of the same size to gauge the variation in estimates or predictions. For example, estimate the mean word length in a book by randomly sampling words from the book; predict the winner of a school election based on randomly sampled survey data. Gauge how far off the estimate or prediction might be. CCSS MATH.CONTENT.HSS.IC.B.4: Use data from a sample survey to estimate a population mean or proportion; develop a margin of error through the use of simulation models for random sampling. CCSS MATH.PRACTICE.MP7: Look for and make use of structure. Computer Science |
Term | Definition | For Additional Information |
Fermi problem | An estimation problem designed to teach dimensional analysis, approximation, and the importance of clearly identifying one's assumptions. | |
Estimation | Predicting a result using logic and data when it is difficult or impossible to determine the exact number. | |
Web crawler | A Web crawler is an Internet bot which systematically browses the World Wide Web, typically for the purpose of Web indexing. |
Concept | Definition |
Algorithm | Creating an ordered series of instructions for solving similar problems |
Abstraction | Reducing complexity to define main idea |
Decomposition | Breaking down tasks into smaller, manageable parts |
Pattern Recognition | Identifying trends and commonalities between data points, groups, or sets |
Contact info | For more info about Exploring Computational Thinking (ECT), visit the ECT website (g.co/exploringCT) |
Credits | Developed by the Exploring Computational Thinking team at Google and reviewed by K-12 educators from around the world. |
Last updated on | 07/02/2015 |
Copyright info | Except as otherwise noted, the content of this document is licensed under the Creative Commons Attribution 4.0 International License, and code samples are licensed under the Apache 2.0 License. |
ECT: Surveys and Estimating Large Quantities of