1 of 64

05-898B Mini Data Science for Product Managers

Sherry Tongshuang Wu

sherryw@cs.cmu.edu

2024/03/10

2 of 64

Meet your instructor! (Me)

Sherry Wu (She/Her/Hers)

Assistant Prof. @ CMU HCII

Work on HCI and NLP!

I study how humans interact with AI systems.

Office hour: Talk to me after class

Email: sherryw@cs.cmu.edu

3 of 64

Teaching Assistants

Jaehee Kim

Senior in School of Computer Science

AI major w/ concentration in HCI

Email: jaeheek@andrew.cmu.edu

Office hour: Wednesdays, 2:30 - 3:30PM, GHC 7501

4 of 64

Teaching Assistants

Yeonji Baek

Junior studying Statistics and ML

Minor in Business Analytics + HCI

Office hour: Fridays, 2-3PM, GHC 7101

5 of 64

05-898B Mini Data Science for Product Managers

Sherry Tongshuang Wu

sherryw@cs.cmu.edu

2024/03/10

6 of 64

What is Data Science and why it matters?

CLASS QUESTION

7 of 64

What is Data Science?

8 of 64

What is Data Science?

“The sexiest job of the century”

Harvard Business Review

9 of 64

What is Data Science?

A data scientist is a statistician who lives in San Francisco.

Jeremy Jarvis

10 of 64

What is Data Science?

A data scientist is someone who is better at statistics than any software engineer and better at software engineering than any statistician.

Jeremy Jarvis

11 of 64

What is Data Science?

“The ability to take data—to be able to understand it, to process it, to extract value from it, to visualize it, to communicate it—that’s going to be a hugely important skill in the next decades, … because now we really do have essentially free and ubiquitous data.”

Hal Varian, Google’s Chief Economist - The McKinsey Quarterly, Jan 2009

12 of 64

What is Data Science?

Manipulate & hack data,

understand operations

Have in-depth questions you want to answer / hypotheses you want to test!

Know basic analytics & can interpret data!

Source: Drew Conway

13 of 64

Why do we care?

CLASS QUESTION

14 of 64

Why do we care?

Data is everywhere! 2017: 2.5 exabytes (quintillion bytes) of data per day, largely unstructured —DOMO

Business? Data-centered & computational!

Biology? Data-centered & computational!

Physics? Data-centered & computational!

Medicine? Data-centered & computational!

Social Sciences? Data-centered & computational!

15 of 64

How can we leverage data?

Improve your fitness by targeted training

Improve your product by targeting your audience

Make better decisions (e.g. choose right medication, pick good restaurant)

Predict elections, events, crowd behavior, etc.

Many more applications...

16 of 64

Why do we (as PMs) care?

CLASS QUESTION

17 of 64

Let’s preview the course, with a in-class breakout activity!

You will have many of these throughout the class.

Class Challenge

18 of 64

5mins!

Form a group of 2-3 people.

Help each other sign up for the slack channel.

Navigate to the #lecture channel.

This is how we take class attendance.

Post answer to #lecture on Slack & tag all members.

[Slack invitation link on canvas]

https://bit.ly/2025s-pmds-slack

CLASS CHALLENGE

19 of 64

Look at this visualization on the left (from a data scientist on you team), and discuss with your neighbor:

  • What insight did you get from the viz?
  • What do you like the viz?
  • What do you want to be changed?

Post your answer to #lecture!

CLASS CHALLENGE

20 of 64

CLASS CHALLENGE

21 of 64

CLASS CHALLENGE

22 of 64

“I trained a model and it has 98% accuracy”

-Data Scientist on your team

CLASS QUESTION

23 of 64

-Data Scientist on your team

CLASS QUESTION

24 of 64

-Data Scientist on your team

CLASS QUESTION

25 of 64

-Data Scientist on your team

CLASS QUESTION

26 of 64

Why Data Science for Product Managers?

You can improve your product with data

  • Targeting your audience
  • Automating aspects of your product
  • Engaging your audience with data

But if your product relies on data, do YOU have the skills to interrogate it effectively? Can you interpret the data, the analysis your teammate give you, or the errors your teammates made?

27 of 64

The Data Pipeline

28 of 64

Question

What problem do you want to solve?

Is it the right question?

Is it answerable?

Question

Collection

Cleaning

Integration

Analysis

Visualization

Presentation

Dissemination

29 of 64

Collection: How to collect data?

Is it the right data for the question? How hard/easy to collect?

Question

Collection

Cleaning

Integration

Analysis

Visualization

Presentation

Dissemination

30 of 64

Cleaning: How dirty is real data?

Jan 19, 2020

January 19, 20

1/19/20

2020-01-19

19/1/20

What flaws exist in the data?

How do we address them?

Question

Collection

Cleaning

Integration

Analysis

Visualization

Presentation

Dissemination

31 of 64

Integration

Question

Collection

Cleaning

Integration

Analysis

Visualization

Presentation

Dissemination

How do you combine data from multiple sources to provide the user with a unified view?

32 of 64

Analysis: How will you analyze the data?

Classification: Predicting which of a set of classes an entity belongs to

Regression: Predict the numerical value of some variable for an entity

Similarity Matching: Find similar entities based on what we know about them

Clustering: Group entities together by their similarity

And lots more (co-occurrence grouping, pattern mining, link prediction, data reduction, etc, etc)

Question

Collection

Cleaning

Integration

Analysis

Visualization

Presentation

Dissemination

33 of 64

Visualization

Question

Collection

Cleaning

Integration

Analysis

Visualization

Presentation

Dissemination

“The use of computer-supported, interactive, visual representations of abstract data to amplify cognition.”

34 of 64

Presentation

Question

Collection

Cleaning

Integration

Analysis

Visualization

Presentation

Dissemination

Papers

Talks

Videos

Blog Posts

Interactive Notebooks

Explorables

35 of 64

Dissemination

Source Code

Web Applications

Products

Question

Collection

Cleaning

Integration

Analysis

Visualization

Presentation

Dissemination

36 of 64

37 of 64

COVIDcast

What data are good indicators of COVID-19?

APIs, Datasets

Messy geographic data, backfills

Merge data from all of the indicators

Forecasting, hotspot detection

Map, Small Multiples, Animation

Blogs, Social Media, Notebooks, Source code

Web application, Public APIs

Question

Collection

Cleaning

Integration

Analysis

Visualization

Presentation

Dissemination

38 of 64

Questions?

Question

Collection

Cleaning

Integration

Analysis

Visualization

Presentation

Dissemination

39 of 64

Which ones will we focus on?

Question

Collection

Cleaning

Integration

Analysis

Visualization

Presentation

Dissemination

40 of 64

THIS CLASS: The Content

41 of 64

Preview: Data Quality and Wrangling

“In Data Science, 80% of time spent prepare data, 20% of time spent complain about need for prepare data.”

42 of 64

Preview: Data Collection, Biases, and Provenance

“Garbage-in, garbage-out!”

43 of 64

Preview: Exploratory Data Analysis

"Exploratory data analysis is detective work. It involves looking at data in many different ways, digging deep, and uncovering hidden insights."

44 of 64

Preview: Visualization

“The use of computer-supported, interactive, visual representations of abstract data to amplify cognition.”

45 of 64

Preview: Feature Engineering and Stats

Find dimensions that matter to the questions being asked.

46 of 64

Preview: Machine Learning

Find dimensions that matter to the questions being asked.

47 of 64

Preview: Communication

Data storytelling

Interpretation methods

48 of 64

THIS CLASS: The logistics

49 of 64

General information

  • This is a core course for the Masters in Product Management.
  • We have masters-level expectations for workload and quality.
  • You will spend 12 hours a week on the course.
  • The nature of the work requires interim weekly deliverables - work is distributed throughout the week.

50 of 64

Syllabus and Class Structure

05-898 B4, Spring 2025, 6 units mini course

Monday/Wednesdays 12:30-1:50pm

Syllabus on course webpage (link on Canvas)

Slides posted after each lecture

Check the schedule of topics (may change!)

Be familiar with the class rules

https://bit.ly/2025s-pmds-syllabus

51 of 64

Active lecture

Active lecture

Case study driven

Discussions highly encouraged

Regular in-class activities, breakouts

Setup the ability to read/post to Slack during lecture

Contribute your own experience!

Discussions over definitions

52 of 64

Recordings and Attendance

Try to attend lecture -- discussions are important to learning, especially this topic

Participation is part of your grade (more on this later!)

Slides will be released after class

No lecture recordings by default

Contact me for accommodations (illness, interview travel, unforseen events) or have your advisor reach out. I try to be flexible

53 of 64

Communication

Assignments, quizzes, and grades will be posted to Canvas

Course schedule and slides will be posted on the webpage.

All announcements through Slack #announcements

Post questions on Slack: Please use #general or #assignments and post publicly if possible; your classmates will benefit from your Q&A!

Invite link on Canvas

We also just set it up today!

54 of 64

Grading

A+ (97-100%) Professional level work, showing highest level of achievement

A (93–96.9%) Extraordinarily high achievement and command of subject matter

A- (90–92.9%) Excellent and thorough knowledge of the subject matter

B+ (87–89.9%) Full understanding of material; quality work

B (83–86.9%) Above average fulfillment of all course requirements

B- (80–82.9%) Fulfillment of all course requirements, acceptable work

C+ (77–79.9%) Satisfactory quality of work

C (73–76.9%) Minimally acceptable performance and quality of work

C- (70–72.9%) Unacceptable work, does not demonstrate mastery

D+ (65–69.9%) Unacceptable work

Below 65 Failure

55 of 64

Grade Breakdown

Quizzes 10%

Participation 10%

Assignments 80%

(Bonus points) 5%

There won’t be a final exam.

56 of 64

Quizzes, bi-weekly

Very easy, mostly multi-choice questions, test your understanding of simple concepts, answers all in the lecture

57 of 64

Participation

Participation != Attendance

Grading:

100%: Participates actively at least once in most lectures by (1) asking or responding to questions or (2) contributing to breakout discussions

75%: Participates actively at least once in two thirds of the lectures

50%: Participates actively at least once in over half of the lectures

25%: Participates actively at least once in one quarter of the lectures

58 of 64

Assignments, (almost) weekly

A series of assignments built on each other

Essentially, 4 steps in data science�HW0 is a preview on a simple dataset, and will help you set up all the necessary env – come to this Wed class!

Be careful of error propagation, fix things early!

Will have a final presentation

59 of 64

Research in this Course

We want to know what makes an effective/happy human-AI pair for DS tasks!

You will be able to use GenAI across all homeworks, on Google Colab.

The PhD student will help you set this the homework env on Wed and explain more. Bring your laptop!

All data will be anonymized.

You can opt out & the instructing team won’t know & it won’t affect your grade.

But if you sign up for 60-90min think aloud interview with the researcher, you can also get bonus point (+5). First come first serve!

60 of 64

Assignments - Late Policy

Each day late is 10% off (up to maximum of 50%) - Automatic

If you have questions, contact me or the TA early �(not after the assignment is due!)

Submitted via canvas - make sure you can login!

61 of 64

Assignment #0 – Pre-survey

Due: Due Wed (March 12) at 12:30 PM Eastern Time

Already posted on Canvas

HOMEWORK

62 of 64

Academic honesty

See web page

In a nutshell: do not copy from other students, do not lie, do not share or publicly release your solutions

If you feel overwhelmed or stressed, please come and talk to us (see syllabus for other support opportunities)

63 of 64

Introductions

Before the next lecture, introduce yourself in Slack channel #social:

  • Your (preferred) name
  • In 1~2 sentences, your background and goals (e.g., coursework, internships, work experience)
  • One topic you are particularly interested in learning during this course?
  • A hobby or a favorite activity outside school

64 of 64

See you Wed!

Check canvas access

Join the slack workspace