1 of 109

CS 448B Progress Presentations I

Monday Nov 12

2 of 109

Jen Cardona

3 of 109

Diet in the Context of Carbon Emissions

Jen Cardona

4 of 109

How does your diet impact the environment?

Estimated Percentage of Global Emissions from Agriculture:

  • 10-12% (excluding fuel, fertilizer, and land change) (Smith et al., 2007)
  • 30% (including fuel, fertilizer, and land change) (Bellarby et al., 2008)

Project Aims:

  • Present information about greenhouse gas (GHG) emissions from foods
  • Provide an interactive diet planner to compare impact of different choices

5 of 109

Existing Tools: Static Visualizations

6 of 109

Existing Tools: Interactive Vizualizations

7 of 109

Design Plan: Visual Explainer

  • Compare different food types
  • Compare items in same group
  • Include interactivity here

8 of 109

Design Plan: Interactive Diet Planner

Features:

  • Bar plot with emissions for each food item added and total
  • Reference lines will be included to compare to other common emissions
  • Separate bar that shows calories with reference for daily intake

9 of 109

Questions

How specific would you want the foods to be?

What references would be relevant/important for you?

10 of 109

Alison Chen, Carmelle Millar, Sydney Hutton

11 of 109

Alison // Carmelle // Sydney

America

through a

Film Lens

12 of 109

Problem & Dataset

  • What do people choose to watch and how does that reflect their culture or beliefs? How do Americans react differently to media across the country?
  • We plan to visualize movie preferences across the US
  • We intend to investigate this by mapping a dataset of 474 movies released in 2010-2014 categorized by genre, using Google Trends data to see where people are searching for what movies.

13 of 109

Relevant Work

  • Mental Floss’ “The Most Popular Pixar in Each State” analyzed Google Trend Data to find each state’s favorite Pixar movie

  • We were inspired by pudding.cool’s “Gastronomic Border’s of the US” use of dynamic comparison maps

14 of 109

The Vision

  • We intend to broaden the scope and specificity of past work to look at each metro area’s preference for movie genre, rating, sequels and budget
  • We will implement dynamic comparison maps, using The Pudding method as inspiration
  • We want to offer a chronologically filtering option, so users can track trends over time

15 of 109

Current Progress

  • Setting up geographic map of US for d3
  • Scraping Google Trends data for the 474 American Movies released between 2010-2014
  • Prototyping user interaction maps

16 of 109

Current Progress

Comparisons of popularity between different movie genres

17 of 109

Current Progress

Popularity of a single movie over time

18 of 109

  • What other film metrics can we investigate?
  • How can we better understand regional trends/dynamics in the United States to interpret/analyze our data as they relate to culture?
  • How can we most accurately define the popularity of a movie?

Questions

19 of 109

Travis Chen, Michelle Lam, Lucy Wang

20 of 109

THE REDDIT REVIEW

How do Reddit users feel about ______?

Travis Chen, Michelle Lam, Lucy Wang

CS 448B Autumn 2018

21 of 109

How do Reddit users feel about certain topics?

Dataset of 1.7 billion Reddit comments

THE PROBLEM

22 of 109

PRIOR WORK

23 of 109

PRIOR WORK

24 of 109

PROGRESS

GLOBAL VIEWS

25 of 109

PROGRESS

COMPARISON VIEWS

26 of 109

PROGRESS

SYNTHESIS (I)

27 of 109

PROGRESS

SYNTHESIS (II)

28 of 109

PROGRESS

SYNTHESIS (II)

29 of 109

  • Are people interested in this topic? What insights do people seek w.r.t. Reddit data?
  • Does this design make sense to non-users of reddit?
  • How might we foster greater exploration across subreddits and topics?
  • What other information would be interesting to see about subreddit sentiment?
  • How would you define a “similar” topic?
  • Does the user flow make sense? How might we make it more intuitive/usable?

QUESTIONS

30 of 109

Garrick Fernandez

31 of 109

Anti-Eviction Mappings

Garrick Fernandez

32 of 109

Idea: map eviction data for tenants in SF (DataSF)

  • Building on work done by Anti Eviction Mapping Project
  • Why? Evictions often go unnoticed, and mapping them (countermapping) can be used a tool for political resistance and policy making
    • Eviction Lab purportedly missing half of all evictions in SF

33 of 109

Good mappings are important!

34 of 109

Current Work

  • Manissa M. Maharawal & Erin McElroy: “The Anti-Eviction Mapping Project: Counter Mapping and Oral History toward Bay Area Housing Justice.”
    • Methodologies, purposes, and existing visualization work
  • Jeffrey Heer, Maneesh Agrawala, Wesley Willett: “Generalized Selection via Interactive Query Relaxation”
    • Possible avenues for developing a more accessible visualization

35 of 109

What’s different this time?

  • Benchmarks for accessibility, ease-of-use, and persuasiveness

Incorporate query relaxation methods to allow easier exploration of the data. Add tools like brushing and Voronoi mappings

36 of 109

What’s different this time?

  • Benchmarks for accessibility, ease-of-use, and persuasiveness

Switch between views, and have selections persist between view switching

37 of 109

What’s different this time?

  • Benchmarks for accessibility, ease-of-use, and persuasiveness

Martini-glass style narrative; allow introduction to data and trends before opening up to user

38 of 109

Current progress… :-)

  • Drafting design document of viz and the narrative
  • Emailed Erin McElroy of AEMP for guidance, data, and possible directions
    • Would like to make a positive contribution to the space!
  • Revisiting SF restaurant finder to extract concepts I’ll need for this project
    • Data formatting, dynamic filtering, point selection methods

39 of 109

Questions

What questions or trends are we most interesting in finding in the data?

How to express uncertainty with the data? Collection methods, etc.

How do we make a visualization persuasive and exhortative? Ties to activism

What am I missing? Am I missing anything?

40 of 109

James Hong

41 of 109

Interactive Tool for Labeling

Identities in 70,000 Hours of TV News

James Hong

42 of 109

Who is in the news? How are they portrayed?

Syed Rizwan Farook Images

CNN

MSNBC

FOX

43 of 109

How much screen time did Donald Trump and Hillary Clinton receive in 2016?

44 of 109

Example Workflow

  • Start with one or more example images
  • Find similar faces in the dataset
  • Have a human validate a subset
  • Train a model

(1)

(2)

(3)

(4)

45 of 109

Works great! … until

46 of 109

Project: A Better Labeling Tool

Goals:

  • Rapidly generate labels and models
  • Interactive refinement of both
  • Prioritization of user’s attention

Methods:

  • Visualizations + algorithms to propose examples
  • Tableau style exploration (to find correlated errors)

Related work: data programming, crowd-sourcing, ImageNet, etc.

47 of 109

Mock-Up

  • Labelling interface
  • Interactive charts tied to images and models (for scrubbing quickly and debugging)
  • Filters on metadata

Metadata Filters

- channel

- show

- gender

- time

- size

- sharpness

- in_commercial

- ...

Wolf Blitzer

Distribution of model predictions; Estimated precision and recall curves

Controls and shortcuts

Other thoughts:

  • Intelligently choose images for human to label (hard examples)
  • Model training continuously in real-time in the background

48 of 109

Questions

  • Other useful charts/metrics for evaluating ML models that can be made interactive in a way that can help a labeller?
  • Generality of this approach to other datasets with metadata?
    • E.g., validating model performance on various traffic cameras
  • Other related work that comes to mind?
  • Good scalable library and framework suggestions?
    • ~20 million images (with dozens of CPU cores and lots of RAM)
  • Programmatic interfaces?
    • To what extent does it make sense to embed this in a jupyter notebook?
    • Abstractions/APIs?

49 of 109

Chris Yoon, Karen Huynh, Kashif Nazir

50 of 109

WAYSfindr

Karen Huynh

Kashif Nazir

Chris Yoon

51 of 109

Problem Area

52 of 109

In 2013, Stanford introduced a new general education requirement called the Ways of Thinking/Ways of Doing that replaced its predecessor, the GERs (General Education Requirements). The biggest change between the two systems is the flexibility of courses that students could take. WAYS focuses more on developing a “well-rounded toolkit of intellectual skills” while GERs focused on expanding the breadth of courses taken by students. We want to explore how the new WAYS requirements affect students’ course planning and visualize the difference. Specifically, we want to explore whether WAYS really did increase the flexibility of course choices for students.

53 of 109

GERs vs. WAYS

WAYS:

  • Onc course each in:
    • Applied Quantitative Reasoning (AQR)
    • Creative Expression(CE)*
    • Engaging Diversity (ED)
    • Ethical Reasoning (ER)
    • Formal Reasoning (FR)
  • Two courses each in:
    • Aesthetic and Interpretive Inquiry (AII)
    • Scientific Method and Analysis (SMA)
    • Social Inquiry (SI)

GER:

  • Thinking Matters
  • Disciplinary Breadth
    • Engineering and Applied Sciences
    • Humanities,
    • Mathematics
    • Natural Sciences
    • Social Sciences
  • Education for Citizenship
    • Ethical Reasoning
    • The Global Community
    • American Cultures
    • Gender Studies

54 of 109

Previous Work

55 of 109

56 of 109

We are convinced that by focusing less on the specific content of courses and more on the purposes and goals that such courses are designed to serve, we can create a system far better than the current one—more coherent, more transparent in its rationale and learning goals, and more responsive to the needs, interests, and aspirations of individual students.

- The Study of Undergraduate Education at Stanford University (2012)

57 of 109

ExploreCourses

WAYS Filter

58 of 109

Current Progress

59 of 109

Dataset

60 of 109

Exploratory Data Analysis

Number of Students Enrolled

Number of Classes Offered

61 of 109

Prototype Sketch

62 of 109

Questions for Feedback

  • Does the proposed visual answer the question? (Did WAYS increase the flexibility of course choices for students over GERs?)�
  • Do you think the proposed visual and filters could help students find interesting WAYS classes?�
  • How intuitive do you think the color pairing between visuals is (without explaining the transition)?�
  • Do you think that there’s a better way to incorporate departments? (There are A LOT, around 150)

63 of 109

Darby Schumacher, Ian Jones

64 of 109

More than Blue Jeans, Trucks, and Beer?: An exploration of country music lyrics

Darby Schumacher & Ian Jones

65 of 109

The Question

  • What lyrics are the “most country”?
  • We want to discover what words are most common across the genre of country music relative to the most popular music
  • Are there misconceptions about what country music is generally about?
  • Do people simplify and stereotype country music too much?
  • These are all questions that we hope to answer throughout the next few weeks
  • Could change people’s perception of the country genre in general
  • Interesting to find trends & differences between gender, age, region, etc.

66 of 109

Relevant Prior Work

  • Inspired by The Pudding article “The Words That Are ‘Most Hip Hop’”
  • NLP techniques have been used to determine what words are most commonly seen in their respective genres with the end goal of classification of music to genres, but we want to focus on interesting patterns in these words, rather than in classification
  • The Pudding’s article is a stronger analysis, uses relevant comparison corpus, while other analysis uses a comparison corpus that is not drawn from lyrics

67 of 109

Fig. 1 Graph from The Pudding article depicting “Most Hip Hop” words

Fig. 2 Graph from The Pudding article comparing “Hip Hop” words to all other genres

68 of 109

Making our Data Set

  • To do this analysis, we have to make our own data set
  • Scraping Song Title and Artist from Billboard Hot 100/Hot Country Songs for each week over the past 3-5 years
  • Any overlapping country songs are removed from the Hot 100 data set
  • Using Beautiful Soup 4, we are scraping lyrics embedded in Genius’ HTML for each song in the Hot 100 and Hot Country data sets
  • Each data set is a dictionary of dictionaries, so each song can have a dictionary of each word and number of occurrences
  • All individual song dictionaries are combined into a chart-wide data set for Hot 100 and Hot Country

69 of 109

Progress

  • Started writing scripts to scrape the data
    • Originally found 250GB dataset, but this is too massive to work with so need to scrape ourselves
  • Implemented a word cloud generator with D3
    • Although word clouds are generally not encouraged for data visualization because length of word influences interpretation, we still think that as long as it is not our primary means of communicating our findings that it will be interesting to have
  • Math behind “determining country-ness”
    • Log of word count occurrence in country music (per 10000 words) / occurrence in hot 100
  • Math behind characterizing specific artists
    • TF-IDF scores: (count of given word used by artist) / (count of artists that use the word)
  • Math behind determining similar artists
    • Cosine similarity (calculation of “closeness” of two vectors)

70 of 109

Moving forward

  • Finish scraping the data
  • Run our analyses to determine the “most country” words
  • Will be extending our analysis to compare gender
    • This was not done in The Pudding article, because Hip Hop is so male-dominated
    • Divide dataset by gender of artist and re-do analysis
  • Building a fully interactive website that allows for searching two different country artists and viewing their lyric similarity

71 of 109

Questions we want feedback on

  • Are there any other demographics other than gender that we should look into regarding lyrics usage differences?
  • Would a ranking of the “most country” country music songs/artists (highest frequency of lyrics that are the “most country”) be interesting to view?

72 of 109

Sidra Ijaz, Rebecca Lane

73 of 109

Colleges That Are Worth It

Rebecca Layne & Sidra Ijaz

74 of 109

Motivation

What factors are most important in selecting a college?

  • Academics
  • Social Life
  • Diversity
  • Tuition
  • Prestige/Selectivity
  • etc.

What about Post-Graduation Starting Salary?

75 of 109

Problem

Our Goal

  • Post-graduation starting salary is a factor that is often overlooked in the college search process
  • To create a webpage with an interactive visualization tool that incorporates colleges’ post-graduation salaries along with other factors that will help students and other users make more informed judgements
  • Most relevant prior works are not visualizations and are instead comparison charts or lists, and median starting salary is usually ignored
    • Ex) College Factual and College Board

76 of 109

Data Set

  • Approximately 300 colleges/universities in the United States
  • Variables include:
    • Region
    • School type
    • Starting median salary
  • We added:
    • City
    • State
    • Latitude/longitude
    • Acceptance rate
    • Number of undergrads
    • Tuition
    • Average cost of attendance

77 of 109

Current Progress - Exploratory Data Analysis

78 of 109

Design Ideas

79 of 109

80 of 109

81 of 109

Zoom in on Different School Types

82 of 109

83 of 109

84 of 109

85 of 109

86 of 109

Additional Features

  • Add search bar for schools
  • For each of these visualizations we can add filters by:
  • Distance to certain location
  • Acceptance Rate�
  • Size of points represents number of undergraduates

87 of 109

Feedback Questions

  • What are the most important factors you considered when applying to colleges? What other features should we include?
  • Thoughts on the main visual?
  • Thoughts on breaking up the main visual into separate visuals based on school type?
  • Ideas for school type breakdown categories? (Currently, we have Ivy League, State, Engineering, Liberal Arts, Other)

88 of 109

Chenchen Pan, Flora Wang

89 of 109

Housing Price Trends

Chenchen Pan and Flora Wang

90 of 109

Motivation

There has been a lot of work done to predict housing prices.

  • Zillow House Price Index
  • “Predicting House Prices Using Machine Learning” (Salvador, et al., 2018)
    • Ames, Iowa

“...much of the dynamics of house prices can be explained by fundamentals such as demographic change” (Cvijanovic, 2012)

91 of 109

Problem

Dataset: U.S. Census Bureau

  • How are demographics associated with house prices?
  • How does the Zillow House Price Index compare with the actual sale price of the house?
  • How the seasons associated with house prices?

92 of 109

Average House Price for Last 8 Years

USD

YEAR

93 of 109

94 of 109

95 of 109

Preliminary Sketches

96 of 109

Feedback

  • Which questions should we focus on? Is our project too broad?
  • Is our display of the information and story intuitive and clear? If not, how can we make it more cohesive?
  • How should we visualize data geographically?
  • Are there counties that you are interested in?

97 of 109

Thank you

98 of 109

Allison Park

99 of 109

Where do people choose to spend their last days?

CDC Mortality Data: 1999 - 2016

Allison Park

100 of 109

Place of Death

Motivation

Previous Work

Preliminary EDA

Questions

101 of 109

Motivation

102 of 109

Previous Work

Source: National Vital Statistics System.

Communicates a story, but lacks granularity

103 of 109

Preliminary EDA

104 of 109

Preliminary EDA

105 of 109

Preliminary EDA

106 of 109

Preliminary EDA

107 of 109

Next Steps

  • Incorporate different dimensions - State, Race, Gender
  • Choose a direction: exploration vs. exploitation
  • Augment the dataset
    • Number of hospice care experts / training facilities
    • Number of geriatricians
    • Data that may help explain discrepancies among different states / race / gender

108 of 109

Questions

  • Exploration vs. Exploitation
  • Interactive plot vs. Animation
    • Describe the change thoroughly in different aspects

vs

    • Focus on one or two places of death, evaluate current infrastructure with regards to this change

109 of 109

Thank you!