1 of 18

CSE 163

Data Literacy�

Suh Young Choi

🎶 Listening to: Across the Spider-Verse Soundtrack

💬 Before Class: What parallel universe would you like to live in?

2 of 18

Last Time

  • Data Visualization
  • seaborn

This Time

  • Data Literacy
  • Data Context
  • Data Storytelling
  • Tips for Writing ☺

2

3 of 18

Announcements

  • THA 2 Peer Reviews due tonight!

  • THA 3: Education due tomorrow
    • Creative Components cannot be resubmitted

  • Project Group Finding Form due tomorrow
    • Only fill it out if you wish to work on a Project!
    • (You are choosing either the Project or the Portfolio—not both!)

  • Section codes: if not attending your registered section, indicate which section you have attended!

3

4 of 18

MultiIndex�Recap

Filtering on a MultiIndex requires .loc with a tuple for the row indexer

4

# MultiIndex of (year, month, day)

# returns count for one day

earthquakes.loc[(2016, 7, 27), “count”]

# returns count for all days

earthquakes.loc[:, “count”]

# returns count for multiple days in the same year/month

earthquakes.loc[(2016, 7, slice(10, 15)), “count”]

# returns count for multiple months

earthquakes.loc[(2016, [7, 8], 27), “count”]

5 of 18

Does code tell us anything about data?��Sometimes!

5

import pandas as pd

import matplotlib.pyplot as plt

import seaborn as sns

data = pd.read_csv('home/data.csv')

data = data[['name', 'fin_length', 'age']]

data = data.dropna()

sns.relplot(data=data, kind='line', x='age’, y='fin_length', hue='name')

plt.title('Shark Ages vs. Fin Length')

plt.xlabel('Age (months)')

plt.ylabel('Fin Length (in)')

plt.savefig('/home/plot.png')

suggests >3 columns

suggests missing values

suggests quantitative and categorical variables

6 of 18

Data Context Guiding Questions

  • Who: the people represented in the data or responsible for its collection and usage
  • What: the information that is represented in the data, such as demographics or measurements
  • When: the timeframe associated with data collection or represented in the data
  • Where: the location of the data, virtual or otherwise
  • Why: the stated (or unstated) purpose of collecting the data
  • How: the methods used for data collection and storage

6

7 of 18

Where do we find context?

  • Data dictionaries

  • README.md or CHANGELOG.md files

  • Metadata sheets

  • Contacting the researchers/data collectors

7

8 of 18

Data Storytelling

Let’s talk about parallel universes…

8

9 of 18

Shark Multiverse…

  • A set of bar graphs illustrating different categorical qualities of the different shark species
  • A YouTube tutorial using this dataset to demonstrate how to drop or replace missing values
  • A neural network using all columns to predict whether the shark is endangered or not
  • A neural network using all columns to predict shark tail size
  • A single string representing the most common shark species in the dataset

9

10 of 18

Narrative Plot Mountain

10

Exposition

Rising Action

Climax

Falling action

Resolution

11 of 18

Data Story Plot Mountain

11

Finding data

Posing research

questions

Pre-processing

Coding

Analysis to answer questions

Testing

Visualizations

Writing

reports

Interpreting results

Presenting

findings

Communications

12 of 18

Finding Main Characters

  1. Make sure the questions are relevant to your data, and the data is relevant to your questions.

  • Ask questions that don't have simple answers.

  • Use your domain knowledge to come up with interesting questions.

  • Pacing matters!

12

13 of 18

Writing Tips

  • We care more about what you have to say rather than how you say it.

  • Keep your writing relevant and to the point!

  • Some strategies for answering your research questions:
    • Answer a question, then explain that answer
    • Answer the question, think about a counterexample/counterargument, then refute it
    • Answer the question, pose a follow-up question, then answer the follow-up

13

14 of 18

Avoiding the Vacuum

Interpret any numbers and/or trends in context!

Do NOT leave free-floating numbers. ☹ (If you do, Suh Young will be a bit sad.)

Do not assume that your reader knows your code as well as you do! (For in-class assignments, you may assume that your reader knows any definitions that we give in the spec.)

Think about explaining your project/portfolio to someone who is not in this class.

14

15 of 18

Sentence-�Level Details

A few mechanical things to think about:

  • Writing in the first-person ("I/We did X for my/our analysis to find Y" as opposed to "X was used to find Y")
  • Active voice vs. passive voice ("The results of A proved B" as opposed to "B was proven using A")
  • Present vs. past tense
  • Using abbreviations or shorthand instead of full names

15

16 of 18

Key Takeaways

Think about how your analysis might be used or interpreted

Consider biases in your data and analysis—even the ones that might come from you!

Consider the impact, ethics, and consequences of your analysis

Data science tells a story—who is our “main character”, and what do we want to focus on?

16

17 of 18

Group Work:

Best Practices

When you first working with this group:

  • Introduce yourself!
  • If possible, angle one of your screens so that everyone can discuss together

Tips:

  • Starts with making sure everyone agrees to work on the same problem
  • Make sure everyone gets a chance to contribute!
  • Ask if everyone agrees and periodically ask each other questions!
  • Call TAs over for help if you need any!

17

18 of 18

Next Time

  • Humanistic Computing

Before Next Time

  • Complete Lesson 9
    • Remember not for points, but do go towards Weekly Tokens
  • Complete THA 2 Peer Review
  • THA 3 and Project Group Finding Form due tomorrow!

18