CSE 163
Data Communication�
Suh Young Choi�
🎶 Listening to: Max LL
💬 Before Class: Happy June! Do you have a favorite book, movie, or piece of media?
Announcements
Project Notebooks due Monday, June 9
Mapping Interview grades released
Final resubmission window closes Friday, June 6
Code Interview Makeup/Retakes on Monday, June 9
2
This Time
Last Time
3
What can this code tell us about the data that’s used?��������Answer on Ed!
4
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns
data = pd.read_csv('home/data.csv')
data = data[['name', 'fin_length', 'age']]
data = data.dropna()
sns.relplot(data=data, kind=‘line’, x='age’, y='fin_length', hue='name')
plt.title('Shark Ages vs. Fin Length')
plt.xlabel('Age (months)')
plt.ylabel('Fin Length (in)')
plt.savefig('/home/plot.png')
Data Settings Revisited
It’s all about the context!
5
Data Context Guiding Questions
6
Where do we find context?
7
Data dictionary example
8
Column name | Feature | Encodings | Type |
id | Unique identifier per shark | N/A | int |
name_code | Encoded name of shark species | 0 – great white 1 – whale 2 – mako 3 – leopard | string |
name | common name of shark species | N/A | string |
name_scientific | Scientific name of shark species | N/A | string |
age | Shark’s age at time of capture, in months | N/A | float |
fin_length | Length of pectoral fin, in inches | N/A | float |
tail_length | Length of caudal fin, in inches | N/A | float |
sex | Sex of the shark | 0 – female 1 – male | int |
health | Treated by on-site veterinarian? | True – yes False – no | Boolean |
README �example
9
# README
This repository contains data about sharks tagged by the Shark Stewards organization between June 2011 and August 2011.
## Folder layout:
.
├── data
│ ├── data.csv
│ ├── data.parquet
│ └── data.json
└── README.md
## Etc…
CHANGELOG �example
10
# CHANGELOG
All notable changes to this repository will be tracked in this file.
## Added
## Fixed
## Removed
- v.2: removed .stdf file
Data Storytelling
Let’s talk about parallel universes…
11
Shark Multiverse…
12
Narrative Plot Mountain
13
Exposition
Climax
Rising Action
Falling action
Resolution
Data Story Plot Mountain
14
Finding data
Posing research
questions
Pre-processing
Coding
Testing
Visualizations
Writing
reports
Presenting
findings
Analysis to answer questions
Interpreting results
Communications
For what research question(s) might this plot be useful?�������Answer on Ed!
Suppose that this is the plot we created from the code earlier:
For what research question(s) might this plot be useful?
15
Finding Main Characters
16
Writing Tips
17
What are possible questions we might have after seeing this plot?�������Answer on Ed!
Maybe we’re missing some context…
18
Avoiding the Vacuum
Interpret numbers and/or trends in context!
Do NOT leave free-floating p-values or ML model evaluation metrics. (If you do, Suh Young will be a bit sad ☹)
Your reader may not know your code as well as you do!
Think about explaining your project/code to someone who is not in this class.
19
Interpreting Numbers
p-values
Pearson’s coefficient (also called R)
20
Interpreting Numbers, cont’d.
Linear model coefficients
ML model evaluation metrics
21
Sentence-Level Details
A few mechanical things to think about:
22
Final Notes
Think about how your analysis might be used or interpreted
Consider biases in your data and analysis—even the ones that might come from you!
Consider the impact, ethics, and consequences of your analysis
No piece of information exists in a vacuum. Contextualize!
Your projects tell a story—who is your “main character”, and what do you want to focus on?
23
Before Next Time
Next Time
24