1 of 11

Workshop Series #3: Data Visualization

05.08.2024

2 of 11

Spring 2024 Workshop Series

  1. Data Cleaning & Processing
  2. Linear Regression & Machine Learning
  3. Data Visualization (today!)

3 of 11

Recap

House Prices - Advanced Regression Techniques | Kaggle

  • We did some feature engineering and data processing
  • We’ve built ML models (random forest and XGboost) to predict housing prices
  • We quantified our performance (mean squared error)
  • How can we leverage visualizations in a data science project?

4 of 11

Visualizations have 3 main uses

(for data scientists)

  1. Exploratory data analysis (EDA)
    1. We saw this with the heatmaps, bar charts, line graphs
    2. Boxplots and scatter plots are helpful here too
  2. Model evaluation
    • Residual plots (regression), ROC curves (classification)
  3. Communicating results/Dashboards
    • Paint the story - generally case by case

5 of 11

Types of Visualizations

6 of 11

Considerations

  1. Visualization type
    1. Depends on what data you want to put in the graph
  2. Readability
    • Legend, labels, titles, etc
  3. Desired Impact on Viewer
    • Is your model good? What does the data show? What conclusions can be drawn?

7 of 11

Software/Coding

  1. Python - huge variety in functionality
  2. R - best for statistics
  3. Excel - used everywhere especially in finance, accounting, consulting
  4. Tableau - drag and drop, simple and used in industry
    1. Free one year Tableau license for students: https://www.tableau.com/academic/students
  5. Microsoft Power BI - dashboarding + in Microsoft ecosystem

8 of 11

Classes to Take

  • STAT 451: Visualizing Data (Python or R)
    • Prerequisite: either CSE 123, CSE 163, or STAT 302
  • CSE 412: Data Visualization
    • Kinda mid but getting revamped
  • HCDE 308: Visual Communication in Human Centered Design and Engineering
    • Heard this is cool
  • INFO 474: Interactive Information Visualization
    • Prerequisite: INFO 340 or CSE 154; CSE 123, CSE 143, or CSE 163; and either QMETH 201, Q SCI 381, STAT 220, STAT 221/CS&SS 221/SOC 221, STAT 290, STAT 311, or STAT 390.

9 of 11

What’s next?

Use this knowledge to make your own projects or learn more about specific models/techniques

  • Full-stack: Kaggle is an amazing ecosystem
    • Kaggle datasets/competitions are amazing boosters (easier with GPT)
  • If interested in big tech consider exploring Cloud (AWS, Azure, GCP)
    • Big on certifications
  • Research in ML or data science at UW - leverage industry focus

10 of 11

Survey

Fill out please!

11 of 11

Thank you for coming!