1 of 23

The Data Cycle

@BootstrapWorld

2 of 23

Telling Your Data Story

Are more animals fixed or unfixed?

  • Open your saved Animals Starter File, or make a new copy.
  • Working in pairs, turn to The Animals Dataset in your student workbooks, or open the Animals Spreadsheet (Google).
  • You and your partner are going to answer a simple question: are more animals fixed or unfixed?

3 of 23

Telling Your Data Story

Data Science is all about asking questions of data. Sometimes the answer is easy to compute. Sometimes the answer to a question is already in the dataset - no computation needed. And sometimes the answer just sparks more questions!

Data Scientists ask a ton of questions, and each question adds a chapter to their data story. Even if a question turns out to be a dead-end, it's valuable to share what the question was and what work you did to answer it!

4 of 23

Telling Your Data Story

The Data Cycle is a roadmap, which helps guide us in the process of data analysis.

  • We Ask Questions - which can be answered with data.
  • We Consider Data. This could be done by conducting a survey, observing and recording data, or finding a dataset.
  • We Analyze the Data, to produce data displays and new tables of filtered or transformed data, to identify patterns and relationships.
  • We Interpret the Data, answering questions and summarizing results. As we've already seen from the Animals Dataset, these interpretations often lead to new questions....and the cycle begins again.

5 of 23

Telling Your Data Story

Are more animals fixed or unfixed?

This was a pretty specific question, and it was straightforward to answer it. But the answers to even simple questions can lead to more interesting questions down the road!

What other questions might come from counting the ratio of fixed to unfixed animals?

6 of 23

Ask Questions

How do we know what questions to ask?

There’s an art to asking the right questions, and good Data Scientists think hard about what kind of questions can and can’t be answered.

7 of 23

Ask Questions

Most questions can be broken down into one of four categories:

  • Lookup questions can be answered simply by looking up a single value in the table and reading it out. Once you find the value, you’re done!
  • Arithmetic questions can be answered by computing an answer across a single column.
  • Statistical questions are where things get interesting! If we asked, "How old are animals at the shelter?", there are lots of ways to answer! We could report the average age, the age that shows up most frequently or the range of the ages. Which one is "right"? As you'll see in this class, it depends...
  • Questions we can't answer would need data that we don't have.

8 of 23

Ask Questions

Lookup, Arithmetic, Statistical, or Can't Answer?

  • What kind of question is "Are more animals fixed or unfixed?"
  • What kind of question is "How old is Toggle?"

9 of 23

Ask Questions

  • Turn to Which Question Type?, and fill out the "Type" column in the table at the bottom. For now, ignore the other columns.
  • Look at the Wonders you wrote on Questions and Column Descriptions. Are these Lookup, Arithmetic, or Statistical questions?
  • Optional: For more practice, complete Question Types: Animals, by coming up with examples of each type of question for the Animals Dataset.

10 of 23

Synthesize

  • How would you explain the difference between Lookup. Arithmetic, and Statistical questions?
  • When you looked back at your Wonders from the Animals Dataset, were they mostly Lookup questions? Arithmetic? Statistical?
  • What are some examples of statistical questions the owner of a sports team might ask? Or a researcher who is trying to see if a cancer drug is effective? Or a principal who wants to know what will help their students the most?

11 of 23

Consider Data

When considering data, we ask:

  • Which Rows do we need?
  • Which Column(s) do we care about?

12 of 23

Consider Data

Tables are made of Rows and Columns.

Each Row represents one member of our population. In the Animals Dataset, each row represents a single animal. In a dataset of temperature readings, each row might represent the temperature at a particular hour.

Columns, on the other hand, represent information about each row. Every animal, for example, has columns for their name, species, sex, age, weight, legs, whether they are fixed or unfixed, and how long it took to be adopted.

13 of 23

Consider Data

If we want to know which cat is the heaviest, we only care about rows for cats, and we only need the pounds column.

If we want to know how many fixed animals are rabbits, we only care about rows for fixed animals, and we only need the species column.

14 of 23

Consider Data

  • If our question is "How old is Mittens?", what rows and column(s) do we need?
  • If our question is "Which animal is the heaviest?", what rows and column(s) do we need?
  • What rows and columns did we need to answer "Are more animals fixed or unfixed?"?

15 of 23

Consider Data

  • Return to Which Question Type?. For each question, which rows would you need to answer them? Which columns would you look at? Write your answers in the last two columns of the table at the bottom.
  • Complete Data Cycle: Consider Data.

16 of 23

Consider Data

Debrief your answers!

How does asking "Which rows? Which columns?" help us figure out what code to write?

17 of 23

Analyzing Data

Once we know what data we need, we can turn our attention to what we want to build with it!

  • Do we need to filter out certain rows and make a new table?
  • Do we need a pie chart? A scatter plot?

What kinds of displays can help us analyze whether there are more fixed or unfixed animals?

18 of 23

Analyzing Data

Are more animals fixed or unfixed?

We could use a bar-chart or a pie-chart to do this analysis, but since we care more about the ratio ("2x as many fixed as unfixed") than the actual count ("20 fixed vs. 10 fixed") a pie chart is the better choice.

19 of 23

Analyzing Data

Once we know that we want a pie-chart, and that we're using it to look at the fixed column, analyzing the data is as easy as reading the Contract!

20 of 23

Analyzing Data

Turn to Data Cycle: Analyzing with Displays, and see if you can fill in the first 3 steps of the Data Cycle for a set of predefined questions. When you're finished, try to make the display in Pyret.

21 of 23

Analyzing Data

What did you learn from the displays you made?

22 of 23

Analyzing Data

In this case, we got a clear answer to our question. But perhaps that's not the end of the story! We might be curious about whether a higher percentage of dogs are spayed and neutered than cats, or whether it's even possible to "fix" a tarantula. All of this belongs in our data story!

23 of 23

Analyzing Data

How do Contracts and the Data Cycle work together, to help us figure out what program will answer our question?