1 of 16

API CAN CODE �Data in Learners’ Lives

Lesson 4: Sources of Data

This work was made possible through generous support from the National Science Foundation (Award # 2141655).

2 of 16

Warmup

  • Refer to the local issues identified in the last lesson: share one of these issues with the class.�

Brainstorm 2-3 sources of data for your chosen issue. These sources could come from multiple sources: you might search the Internet for existing data, design a plan to collect data, or think of a hypothetical (ideal) source of data.

2

3 of 16

Lesson 1.3 Recap

We talked about the Data-Information-Knowledge-Wisdom (DIKW) model of analysis from raw data to wisdom

3

Data

Information

Knowledge

Wisdom

i

Numbers and texts without context

Processed data with context

Information acquired by experience

Analysis of �complex knowledge structures

4 of 16

Sources of Data

4

Secondary Data

  • Internet
  • Social media
  • Financial reports
  • Journals & papers
  • Databases
  • Government publications
  • Open Data

Primary Data

  • Surveys
  • Interviews
  • Observations
  • Experiments
  • Internet of Things (devices with sensors that exchange data)

Primary data is collected directly from the source (target population or system under study).

Secondary data are sources that have already been collected, processed, and made available for use by researchers, analysts, or the public.

5 of 16

Secondary Source - Example

A financial report, presented by the company itself or a reporter presenting their findings, is an example of a secondary source.�

  • How do we know this is a secondary source?

5

6 of 16

Secondary Source - Example

Datasets hosted on government database websites like OpenDataDC, like this one on the location of DCPS schools, is another example of a secondary source.�

  • How can you distinguish this from a primary source?

6

7 of 16

Secondary Source - Example

API hubs such as RapidAPI, which we’ll use later in this course, are another example of a secondary source that provide information directly from an app or website (can be real-time!)

7

8 of 16

It’s All About Coffee?

Imagine you are on the Starbucks research team trying to evaluate the popularity of a new drink. �

What is an example of a primary data

source to answer this question? How

about secondary data source?

8

9 of 16

Activity - Sorting Data Sources

Consider your ideas for local data sources that you generated at the beginning of the lesson.

Are they primary sources or secondary sources? �How do you know?�

Check with the small group around you. �Do they agree with your answers?

9

10 of 16

Class Discussion - Post-Sorting

Review sorted data sources. Each student should share one or two examples, and explain how they sorted them!�

  • Which sources were easy to sort?�
  • Which sources were hard to sort?

10

11 of 16

SelfieCity

SelfieCity (https://selfiecity.net/) is a research project that investigates self-portraits (selfies) data from five cities around the world: Bangkok, Berlin, Moscow, New York, & Sao Paulo.

11

12 of 16

SelfieCity: Data Collection & Analysis

12

Individual participants take selfies.

large-scale data collection website collects 120,000 selfies (20-30K per city).

Automatic face analysis and experts’ validation

Summaries of trends in selfies can be viewed with graphs.

People�(Primary)

Mechanical Turk (Secon.)

Selfie Analysis

Trends & Insights

13 of 16

What Can We Learn from Selfies?

Explore the data and graphs at https://selfiecity.net/.

You can scroll down on the first page, and also open the “Selfiexploratory” page on the top row!

  • What do you notice?
  • What do you wonder?

13

14 of 16

What Can We Learn from Selfies?

Explore the data and graphs at https://selfiecity.net/

Is this a primary or secondary source? What different trends can you recognize? What are the data, and what information are they presented in? �

Compare different graphs from different countries. BIAS: Who is included? Who is left out? How might you collect data differently to address this issue of bias?�

14

15 of 16

Exit Ticket

Categorize the following study designs, both by whether they use a primary or secondary data source and what kind of variable(s) they are collecting: quantitative (discrete/continuous) or qualitative (nominal/ordinal):

    • A researcher is interested in how academic grades affect SAT scores. They reach out to schools to collect the average GPA and average SAT score as data to answer this question.
    • Instagram would like to measure ad effectiveness. They collect the amount of time a user spends on the app on a particular day and the number of ads they click on during that day.
    • A student wants to know if different Starbucks drinks change a drinker’s heart rate. They give fellow students one of three chosen drinks, and measure their heart rate in bpm before and after drinking the drink.

15

16 of 16

Thanks!

apicancode@umd.edu

16

This work was made possible through generous support from the National Science Foundation (Award # 2141655).

API Can Code is licensed under a Creative Commons Attribution-NonCommercial-ShareAlike

4.0 International (CC BY-NC-SA 4.0) License