1 of 20

API CAN CODE �Data in Learners’ Lives

Lesson 6: Building a Survey for Data Collection

1

This work was made possible through generous support from the National Science Foundation (Award # 2141655).

2 of 20

Lesson 1.5 Recap

  • We discussed evaluating data sources using the 5 V’s!

2

Value

 The ability to extract meaningful insights

Velocity

The speed at which data is generated

Volume

The vast amount of data generated

Veracity

Accuracy, reliability, and trustworthiness of data

Variety

The diverse types and formats of data

3 of 20

Good Data Science Questions

Good Data Science Questions can be answered with Data!

Data can come from different places.

Collections of data, called datasets, reflect measurements �from multiple places, multiple time-points, or multiple �people.

Think of a data science question about the local issue�you’ve come up with or revise an existing one. �

(look at these examples for inspiration)

3

4 of 20

Project - Potential Primary Source

Think of a primary source for your project.

What question would you like to answer?�

Who would you like to gather data from? �(Is there a particular population of interest?)�

What questions should you ask? �What kinds of responses do you want to get back?

4

5 of 20

5

Messy Data

Write down your answer to: What grade are you in?

Check with the students around you.

  • Did you all write down the same answer?
  • And, was it formatted the same?

Discuss:

  • What problems could these differences in format cause as people respond to your survey?

6 of 20

Messy Data - Recap

  • How could we fix this issue of multiple answer formats and styles?�
  • Plan one strategy that you could use to �fix this problem if you had already collected�the data, and one strategy to fix this problem �if you were still creating the survey.

6

7 of 20

Sampling Designs - SRS

7

1

3

6

8

9

11

2

4

5

7

10

12

3

11

5

Simple Random Sampling (SRS)

Population

Sample

8 of 20

Sampling Designs - Systematic

8

Population

Sample (Every 3rd)

12

11

10

9

8

7

6

5

4

3

2

1

11

8

5

2

9 of 20

Sampling Designs - Stratified

9

Population

Random Sample

Strata

10 of 20

Sampling Designs - Cluster

10

3

2

1

9

8

7

12

11

10

3

2

1

12

11

10

4

5

6

12

11

10

15

14

13

Population

Sample Group

Clusters

Clusters

(2 Clusters)

11 of 20

Some BAD Sampling Designs…

Convenience samples - based on ease of access, rather than representation. (Like asking your friends to fill out your survey, or giving a survey to the first 100 people to attend a school sports game!)�

Voluntary Response samples - participants can choose if they want to participate if they want to share their opinions. Typically advertised with a poster and QR code.�

What issues of representation and bias are involved?

11

12 of 20

Why do sampling designs matter?

  • Samples get drawn for lots of studies and polls that attempt to represent a population �
  • When these samples do a good job of representing the population, we get an accurate and meaningful picture of what’s really true in the population�
  • When these samples are biased (usually coming from a bad design!), the sample can misrepresent the population and lead to inaccurate conclusions

12

13 of 20

Food Deserts and Sampling Design

  • Next unit, we will explore “food deserts,” or areas in cities with limited access to plentiful, nutritious, and affordable food (ex. areas that are far from grocery stores)

  • Look at this �interactive mapof food access. �What do you �notice?��

13

14 of 20

Food Deserts and Sampling Design

What might bad sampling design look like if you were studying food deserts?

    • What kind of questions might you ask? What would make these questions better or worse?
    • How would you choose the people you were surveying? What could be a potential issue in the sample you choose?

14

15 of 20

Racial Profiling

  • Racial profiling is a discriminatory act of suspecting, targeting, or discriminating against a person based on their ethnicity, nationality, or race rather than on individual suspicion or available evidence�
  • How does this connect to the concept of a bad sampling design? What would the consequences of an investigated sample that focuses on people of a particular race or other group?

15

16 of 20

Google Form Survey Design

Design a Google Form to collect data on your local issue and answer your question!�

  • Also, design a sampling method �for your data collection. �Remember to think about good sampling designs to represent �your population!

16

17 of 20

Response Validation

17

Use the response validation tool in Google Forms to force respondents to respond in particular ways

(like a number!)

18 of 20

Sharing Your Survey

  • In the top right corner, go to this menu and click “Add Collaborators”�
  • Then, move down to General access and switch the settings so that “Anyone with the link” can access the form (so that your teacher can view it!)

18

19 of 20

Exit Ticket

The following represent different people doing data science, such as political pollsters, students, administrators, and the manager of a movie theater. Name each of the following sampling designs:

  • Every 10th name in the phone book is selected for a political poll phone survey.
  • A student measures the height of a patch of trees in his backyard to estimate the average height of all the trees in his town.
  • Every student at Roosevelt High School is numbered 001 - 797. A DCPS researcher randomly selects 30 numbers within this range and sends a satisfaction survey to the students corresponding with those numbers.
  • A University of Maryland campus administrator wants to gauge the university’s popularity in the local area, so they interview the first 200 arrivals to an open house before the open house’s official start time.
  • A local movie theater puts up a poster with a QR code to a survey that moviegoers can fill out to rate their satisfaction with food & drink prices at the movie theater.

19

20 of 20

Thanks!

apicancode@umd.edu

20

This work was made possible through generous support from the National Science Foundation (Award # 2141655).

API Can Code is licensed under a Creative Commons Attribution-NonCommercial-ShareAlike

4.0 International (CC BY-NC-SA 4.0) License