1 of 18

API CAN CODE �Data in Learners’ Lives

Lesson 2: Data Collection and its Purpose & Impact

1

This work was made possible through generous support from the National Science Foundation (Award # 2141655).

2 of 18

Warmup

Brainstorm 5 sources of data in your life

(or recall them from the last lesson!) to

use for today’s lesson!

2

3 of 18

Lesson 1.1 Recap

  • Last class, we discussed that data can take many different forms! �
  • Remember that some data is quantitative (numerical) while other data is qualitative (non-numerical). Further, some quant vars only record values at certain intervals (discrete) while others can take on any value (continuous); some qual vars cannot be ordered (nominal) and others can (ordinal).

3

4 of 18

Data Never Sleeps

Explore the Data Never Sleeps website!�

Look at Data Never Sleeps 1.0. Then, go to Data Never Sleeps 12.0. How are they similar? How are they different? �

  • What kinds of data are being created?

  • Did this website make you think about any kinds of data, or data sources, that you hadn’t thought about before?

4

5 of 18

Data Never Sleeps

Explore the Data Never Sleeps website!

Who might collect or care about some �of these data streams?�

  • Takeaways: Data comes in many forms, some of which you may not have thought about before! Additionally, you probably create and consume much more daily data than you realize!

5

6 of 18

Data Collectors

Return to your list of sources of data in your life from the warmup. �

Come up with one or more “collector” for each source of data you came up with – someone that might be interested in collecting the data you create or consume.

6

7 of 18

Discussion - Data Around the World

  • Consider a Hollywood actor, a Kansas farmer, �and a fisherman from Thailand. �
  • How might each of these �people use data?

  • What data might they consume �and create?

7

8 of 18

8

Data and Privacy - Class Debate

Is there a privacy issue here? What is it?�TikTok accesses your location, contact info, browsing history, and other usage data and is able to share this data with other apps or websites.

9 of 18

9

Data and Privacy - Class Debate

Is there a privacy issue here? What is it?�One of your neighbors had something stolen off his porch, so he installed a Ring camera on his front door. The camera has a view of most of the street, and movement within a distance of 25 feet is usually recorded and collected by Amazon.

10 of 18

Discussion - Privacy in Data Collection & Consumption

  • Who collects your data? Why do they collect it? Are there any examples where this data collection violates your or others’ privacy?�
  • When do you think data collection �crosses a line into a privacy violation?

10

11 of 18

Discussion - Privacy in Data Collection & Consumption

  • In some cases (like if you’re participating in a research study) collectors need to specify the data they’re collecting. �Do you think this should be true in every case?�
  • What could collectors do to better �protect user privacy, or better inform �users when their data is being collected?

11

12 of 18

Representation in Data Collection

Political polls used to be done by calling people on landlines; now, it is done by calling cell phones, from a number the user probably doesn’t know.

Who is represented in these polls? Who is not?

What makes someone more or less likely to be represented? �

12

13 of 18

Representation in Data Collection

Some polls are done through social media. Common sites are Facebook and Twitter.

Who is represented in these polls?

Who isn’t? Could this lead to any problems?

13

14 of 18

Representation in Data Collection

Early facial recognition software was trained using, predominantly, photos of white people.

Who is represented in the training set? Who isn’t?

Could this lead to any problems? �

14

15 of 18

Representation in Data Collection

Read one of these articles:

First article: what does a “false positive”�mean in criminal investigation? How �can bias in representation cause issues?��Second article: who is overrepresented�in photo-matching data? What problem �might this cause?

15

16 of 18

Closing Discussion

  • Why is it important to think about stakeholders in data collection and consumption?�
  • Why should we think about issues with privacy in data collection and use?�
  • Why should we think about representation in data collection and use?

16

17 of 18

Exit Ticket

Imagine a movie production company wants to know which of two new movie ideas will sell more tickets. �They collect data by going up to people who are leaving an AMC near a local college campus from 1pm - 6pm and asking them which idea they prefer.

Who is underrepresented in this survey?

How could the issue of representation �bias the company’s conclusions?

17

18 of 18

Thanks!

apicancode@umd.edu

18

This work was made possible through generous support from the National Science Foundation (Award # 2141655).

API Can Code is licensed under a Creative Commons Attribution-NonCommercial-ShareAlike

4.0 International (CC BY-NC-SA 4.0) License