1 of 29

API CAN CODE �Doing Data Science

Final Project Part 1: Finding Data

This work was made possible through generous support from the National Science Foundation (Award # 2141655).

2 of 29

Warmup

What do you notice? What do you wonder?

2

3 of 29

3

The Final Project: �DOING DATA SCIENCE

Step into the role of a data scientist and �let your project showcase your skills

4 of 29

“Doing Data Science” Final Project

For the final project, you will take on the role of a data scientist! �

This project will take several class periods, so try to choose a topic you wouldn’t mind working on for a while!

4

Research Questions

Identify questions about this dataset

Finding �Data

Access and evaluate a relevant dataset

Data �Refining

Filter, clean and trim the data

Data Visualization

Create data visualizations and interpret

Communicating Results

Share your conclusions and insights

5 of 29

Final Presentation Structure

For the final presentation, we will ask you to make a set of slides that includes:

    • Dataset Used and Trimming: Dataset used, information about your EduBlocks program and any filtering
    • Questions for Investigation
    • Visual Analysis/Graphing Insights: 3-5 graphs
    • Conclusions: How do these graphs answer your questions about the dataset?�Presentations should be about 3 minutes in length.

5

6 of 29

API Can Code Q&A Document!

  • Using RapidAPI, EduBlocks, and/or CODAP, you may encounter errors that you haven’t seen before or don’t know how to fix. �
  • Consider checking our Q&A document, which answers a number of common questions �about each of these three technologies

6

7 of 29

Today’s focus

7

Research Questions

Identify questions about this dataset

Finding �Data

Access and evaluate a relevant dataset

Data �Refining

Filter, clean and trim the data

Data Visualization

Create data visualizations and interpret

</>

</>

Communicating Results

Share your conclusions and insights

8 of 29

Accessing a Dataset [Review]

  • The main focus of today’s class will be to find and access an API dataset that you’re interested in.
  • Before finding your dataset, let’s review an example using EduBlocks to access an API dataset.�
  • Our example involves Library Usage �Data from OpenDataDC.

8

9 of 29

OpenDataDC: Library Usage Data

  • Open this example EduBlocks project, which draws in API data from OpenDataDC on Library Usage
  • Explore the program and run it. What do you notice? What do you wonder?

9

There’s a quick guide in your handouts for future use!

10 of 29

OpenDataDC: Library Usage Data

  • Note that the for loop uses �for i in myJSON[‘features’] instead of the prior �for i in myJSON. What do you think this is for?�
  • The print statement uses i[‘attributes’][‘NAME’] for a library’s name instead of i[‘NAME’]. Why do you think this is?

10

11 of 29

OpenDataDC: Library Usage Data

  • Both the for loop and the print statement are specifically constructed to handle data nesting in this dataset. �
  • Data nesting is when data in an API is organized in layers. ��In this example, all of the meaningful data is within the ‘features’ key, and each library’s meaningful data is within an ‘attributes’ key.

11

12 of 29

OpenDataDC: Library Usage Data

  • Now, clone the program and replace the for loop with a print statement that just says print(myJSON).
  • Don’t forget to put your API Key located on RapidAPI
  • What do you see when you run the program?
  • Based on your understanding of data nesting, why is this happening?
  • Why would this be a problem in your final project?

12

13 of 29

Today’s Goals:

There are three main goals for Part 1 of the final project: �

  • Find an API source of data

  • Design an EduBlocks program �to call the data from the API

  • Evaluate the dataset

13

14 of 29

Find an API Source

  • Browse the list of free APIs for a source of data you think might be interesting to investigate.�
  • You might have a question you want �to answer with this data right now, �or a question might develop as you�explore the dataset.

14

15 of 29

Share with Your Classmates

  • Share with your classmates the API you found interesting and explain why.

15

16 of 29

Design an EduBlocks Program

  • Now, design an EduBlocks program to pull in the dataset you just found. �
  • You can use this basic program to get started, but you’ll need to fill in essential info for your API source.

  • Test your program to make sure your API works! You may need to consider nesting or other issues.

16

17 of 29

Data Evaluation

  • Evaluate your data source using the 5Vs for K-12 Framework. �
  • Use the following slides to guide this evaluation. �

Respond to the questions in today’s � worksheet.

17

18 of 29

Evaluating Data: The 5Vs for K-12

18

Velocity

The recency�of data curation

Variety

The types and structure of the data

Veracity

Accuracy, reliability, completeness, and bias of the data

Volume

The amount �of data available

Value

 The ability to extract meaningful insights

Respond to these prompts in today’s worksheet.

19 of 29

Evaluating Data: Volume

  • How much data is included in the dataset? Or, How many data points/observations/rows are included?
  • Is there enough data to answer the research questions?
  • Is there enough data to draw meaningful insights or conclusions?
  • Is there too much data or too many �observations/rows? Or is some of it not �relevant or not related to the main question?

  • Benchmark: Are there at least 50 rows, �entries or observations in your dataset?

19

20 of 29

Evaluating Data: Velocity

  • When was the data curated or last updated?
  • Does it includes real-time or recent data?
  • Is the dataset relevant to the investigation period?
  • How recent does the data need to be to be useful for the current investigation?

  • Benchmark: Is the dataset you found within a few years of the time period you’re interested in?

20

21 of 29

Evaluating Data: Variety

  • What data types are included in the dataset (e.g., numerical, categorical, text, images, etc.)?
  • Is the dataset structured and organized (e.g., tabular format, JSON, map)?
  • Is the dataset well-documented? Are there details about the variables included?
  • Is the data format consistent across observations �and easy to interpret with the available tools?

  • Benchmark: Does the dataset you’re planning �to use include at least 3 different types of variables? (Nominal, ordinal, continuous, discrete?)

21

22 of 29

Evaluating Data: Veracity

  • Accuracy

  • Reliability

  • Completeness

  • Data biases

22

23 of 29

Evaluating Data: Accuracy

  • What do we know about where the data came from? For example, how was the data collected? By whom? For what purpose(s)?

  • Is the dataset accurate compared to known fact or external source?

Accurate, Reliable

Accurate, Unreliable

Inaccurate, Reliable

Inaccurate, Unreliable

23

24 of 29

Evaluating Data: Reliability

  • Is the data source reliable? Is the source consistently measured? �
  • Some measuring instruments and designs may reliably measure data; others may introduce measurement variability (such as a malfunctioning thermometer or poorly worded question on a survey)

Accurate, Reliable

Accurate, Unreliable

Inaccurate, Reliable

Inaccurate, Unreliable

24

25 of 29

Evaluating Data: Completeness

  • Is there any missing data?�
  • Check to make sure the dataset does not have a lot of missing data points – particularly if they are enough to make the sample small, or if they are systematically missing in certain populations�
  • Benchmark: Does your dataset represent all sub-populations you’re interested in?

25

26 of 29

Evaluating Data: Data Biases

  • Is the data potentially biased?�
  • Are there potential biases in data sampling, reporting, or measurement? Are there any under- or over-represented populations? Are responses influenced by social context? Are tools used systematically mismeasuring a phenomenon?

  • If there are biases, how might these biases �potentially influence the results?

26

27 of 29

Evaluating Data: Value

  • Does the dataset contain the necessary information to address the scientific question or the phenomenon under study?
  • Can it be analyzed to derive meaningful insights or evidence-based claims?
  • Which variables are most helpful or insightful?
  • What additional data could make the dataset more useful?

27

28 of 29

Exit Ticket

  • Submit a link to the dataset you chose to work with
  • Answer the questions regarding the dataset evaluation
  • Paste a link to your working EduBlocks program, and upload a screenshot of the program.
  • Answer - Is your data "nested" in any particular way?

28

29 of 29

Thanks!

apicancode@umd.edu

29

This work was made possible through generous support from the National Science Foundation (Award # 2141655).