The Data Cycle
@BootstrapWorld
Telling Your Data Story
Are more animals fixed or unfixed?
Telling Your Data Story
Data Science is all about asking questions of data. Sometimes the answer is easy to compute. Sometimes the answer to a question is already in the dataset - no computation needed. And sometimes the answer just sparks more questions!
Data Scientists ask a ton of questions, and each question adds a chapter to their data story. Even if a question turns out to be a dead-end, it's valuable to share what the question was and what work you did to answer it!
Telling Your Data Story
The Data Cycle is a roadmap, which helps guide us in the process of data analysis.
Telling Your Data Story
Are more animals fixed or unfixed?
This was a pretty specific question, and it was straightforward to answer it. But the answers to even simple questions can lead to more interesting questions down the road!
What other questions might come from counting the ratio of fixed to unfixed animals?
Ask Questions
How do we know what questions to ask?
There’s an art to asking the right questions, and good Data Scientists think hard about what kind of questions can and can’t be answered.
Ask Questions
Most questions can be broken down into one of four categories:
Ask Questions
Lookup, Arithmetic, Statistical, or Can't Answer?
Ask Questions
Synthesize
Consider Data
When considering data, we ask:
Consider Data
Tables are made of Rows and Columns.
Each Row represents one member of our population. In the Animals Dataset, each row represents a single animal. In a dataset of temperature readings, each row might represent the temperature at a particular hour.
Columns, on the other hand, represent information about each row. Every animal, for example, has columns for their name, species, sex, age, weight, legs, whether they are fixed or unfixed, and how long it took to be adopted.
Consider Data
If we want to know which cat is the heaviest, we only care about rows for cats, and we only need the pounds column.
If we want to know how many fixed animals are rabbits, we only care about rows for fixed animals, and we only need the species column.
Consider Data
Consider Data
Consider Data
Debrief your answers!
How does asking "Which rows? Which columns?" help us figure out what code to write?
Analyzing Data
Once we know what data we need, we can turn our attention to what we want to build with it!
What kinds of displays can help us analyze whether there are more fixed or unfixed animals?
Analyzing Data
Are more animals fixed or unfixed?
We could use a bar-chart or a pie-chart to do this analysis, but since we care more about the ratio ("2x as many fixed as unfixed") than the actual count ("20 fixed vs. 10 fixed") a pie chart is the better choice.
Analyzing Data
Once we know that we want a pie-chart, and that we're using it to look at the fixed column, analyzing the data is as easy as reading the Contract!
Analyzing Data
Turn to Data Cycle: Analyzing with Displays, and see if you can fill in the first 3 steps of the Data Cycle for a set of predefined questions. When you're finished, try to make the display in Pyret.
Analyzing Data
What did you learn from the displays you made?
Analyzing Data
In this case, we got a clear answer to our question. But perhaps that's not the end of the story! We might be curious about whether a higher percentage of dogs are spayed and neutered than cats, or whether it's even possible to "fix" a tarantula. All of this belongs in our data story!
Analyzing Data
How do Contracts and the Data Cycle work together, to help us figure out what program will answer our question?