1 of 56

Week 12: Networks, Text, Maps reprise

Introduction to Data Visualization

W4995.010 Spring 2020

2 of 56

3 of 56

00 Quiz

01 Networks: node-link, matrix, enclosure

02 Text: analysis, network, timeseries

03 Maps (reprise)

04 Final Project Announcements

4 of 56

01

Networks

5 of 56

10min: groups of 4 visualize relationships in Hamilton

Unit: Number of lines in song

Post to following slides

Alexander Hamilton

Aaron

Burr

George Washington

Eliza

Schuyler

Other

Song 1

6

27

7

4

51

2

28

14

0

0

52

3

50

5

0

0

104

4

7

0

0

0

25

6 of 56

Breakout Room 1

7 of 56

Breakout Room 2

Alexander Hamilton

Eliza Shuyler

George

Washington

Other

Aaron Burr

Song 4

Song 1

Song 2

8 of 56

Breakout Room 3: Relationship heat map (opacity = # shared lines)

Hamilton

Burr

GW

Eliza

Hamilton

Burr

GW

Eliza

9 of 56

Breakout Room 4: the breakout room where it happens

10 of 56

Breakout Room 5

Song

Characters

Washington

Hamilton

Burr

Eliza

Other

1

2

3

4

Chord graph encodings:

Chord line: character to song link

Line thickness: number of lines in the song

Color: character name

11 of 56

Breakout Room 6

Alex

George

Eliza

errbody else

Legend

Song 1

Song 2

Song 3

Song 4

12 of 56

Networks vs. Trees

Network = graph of relationships

between discrete objects

Tree = network with hierarchical

structure

Munzner

13 of 56

Three Main Ways to Visualize

Munzner

14 of 56

Common Applications

  • Computer Networks
  • Social Networks
  • Concept Networks
  • Gene Networks

Via Marti Hearst / Jeff Heer

15 of 56

First

Munzner

16 of 56

Tree Layout

Via Jeffery Heer

17 of 56

Common Layout: Reingold-Tilford “Tidy” algorithm

  • Clearly encode depth, no edge crossings
  • Ordering & symmetry preserved, compact layout
  • ~Linear time (binary and n-ary trees)

Via Jeffery Heer

18 of 56

D3 tidy tree

Via Jeffery Heer

19 of 56

Radial = hierarchical tree in polar coordinates

Via Jeffery Heer

20 of 56

Force Directed Layout

  • Nodes repel each other
  • Edges act as springs
  • Friction to ease motion
  • Techniques to dampen “jitter”

Via Marti Hearst

21 of 56

Bostock on force layout encoding opportunity

22 of 56

But scale?

Via Marti Herst, Jeffery Heer

23 of 56

More Nodes, More Problems…

  • Tree breadth often grows exponentially
  • Even with tidy layout, quickly run out of space

Possible Solutions

  • Alpha
  • Filtering
  • Focus+Context
  • Scrolling, Panning, Zooming

Via Jeffery Heer

24 of 56

Hairballs… or maybe not?

Left: Apple, Right: Google, Periscopic via Fast Co. Design

25 of 56

Chord Diagram with Hover: Uber Rides

Bostock, Block

26 of 56

A different layout approach

Bloomberg Graphics, 2016

27 of 56

Also a network

28 of 56

29 of 56

Alternatively...

Munzner

30 of 56

Node-link vs. Adjacency Matrix

Cliques: every node is connected to every other node

Biclique: every vertex of the first subset is connected to every vertex of the second subset

Cluster: a graph whose connected components are cliques

31 of 56

Adjacency Matrix: Les Misérables

Via Marti Hearst

32 of 56

One more way: Arc Diagram

Heer, J. “Visualization Zoo”

33 of 56

Alternatively...

Munzner

34 of 56

Enclosure/Treemaps: filesystem, Schneiderman ‘91

http://www.cs.umd.edu/hcil/treemap-history/

35 of 56

Enclosure/Treemaps: Map of Market, Wattenberg ‘98

Smartmoney.com

36 of 56

02

Text

37 of 56

What does this say?

38 of 56

Text is not preattentive

SUBJECT PUNCHED QUICKLY OXIDIZE TCEJBUS DEHCNUP YLKCIUQ EZIDIXO

CERTAIN QUICKLY PUNCHED METHODS NIATREC YLKCIUQ DEHCNUP SDOHTEM

SCIENCE ENGLISH RECORDS COLUMNS ECNEICS HSILGNE SDROCER SNMULOC

GOVERNS PRECISE EXAMPLE MERCURY SNREVOG ESICERP ELPMAXE YRUCREM

CERTAIN QUICKLY PUNCHED METHODS NIATREC YLKCIUQ DEHCNUP SDOHTEM

GOVERNS PRECISE EXAMPLE MERCURY SNREVOG ESICERP ELPMAXE YRUCREM

SCIENCE ENGLISH RECORDS COLUMNS ECNEICS HSILGNE SDROCER SNMULOC

SUBJECT PUNCHED QUICKLY OXIDIZE TCEJBUS DEHCNUP YLKCIUQ EZIDIXO

CERTAIN QUICKLY PUNCHED METHODS NIATREC YLKCIUQ DEHCNUP SDOHTEM

SCIENCE ENGLISH RECORDS COLUMNS ECNEICS HSILGNE SDROCER SNMULOC

Via Marti Hearst

39 of 56

Tag Clouds

Pro: Can help with “gist” and initial query formation.

Cons

  • Sub-optimal visual encoding (size vs. position)
  • Inaccurate size encoding (long words are bigger)
  • May not facilitate comparison (unstable layout)
  • Term frequency may not be meaningful (“the”)
  • Does not show the structure of the text

Via Jeffery Heer

40 of 56

Simple alternatives are often better

Via Marti Hearst

41 of 56

Added Context: Parallel Tag Clouds, Collins ‘06

Colins, Viegas and Wattenberg, IBM Research

42 of 56

  • Word frequency
    • lists of words and their frequencies
  • Concordance
    • the contexts of a given word or set of words
  • N-grams
    • common two-, three-, etc.- word phrases
  • Entity recognition
    • identifying names, places, time periods, etc.
  • Keyword extraction
    • Term Frequency-Inverse Document Frequency
  • Text Classification
    • Topics, sentiment, etc. etc.

Text Analysis

Techniques

Via Jeffery Heer

43 of 56

Concordance: Word Tree, Wattenberg et. al. ‘07

44 of 56

Entity and relationships: Gorg ‘07 Jigsaw

45 of 56

Topic modeling: Underwood, PMLA journal ‘24-’06

46 of 56

Visualizing NASA Research, 1958–2008

OCR. The Whole Brilliant Enterprise (2004)

47 of 56

Quantifying “cultural impact” via media coverage

OCR. The Whole Brilliant Enterprise (2004)

48 of 56

“Heatmap” (character mentions over time)

Via Marti Hearst

49 of 56

Darwin’s Origin of Species

50 of 56

(Hamilton, by Shirley Wu & Pudding)

51 of 56

03

Maps, reprise

52 of 56

Questions?

Next Class…

53 of 56

Next class: Ethics, What’s Next (in industry/research)

  • Readings x 2
    • boyd + Crawford, “Six Provocations for Big Data”
    • Hullman, What Is Visualization Research?

  • Guest lecture: Prof. Wu on dataviz research

54 of 56

Final Project Showcase

  • 5/5 Tuesday 6–8pm

Lightning round (zoom) + science fair (in rooms)

Invite all your friends! I will send an invite you can forward.

  • Day-Of: final project page + 3 min. lightning talk

  • Later: Documentation due 5/11
    • Still Monday 11:59pm
    • Format is important, follow instructions.

55 of 56

Past years’

56 of 56

STAGE

ENTRANCE

FINAL PROJECT SHOWCASE

Spring 2019 W4995 Intro to Data Visualization

columbiaviz.github.io

Thanks to Center for Data, Media & Society

Brown Center for Media Innovation

Computer Science Department

Designing the Perfect Board Game

The Effects of 911

Citibike and Cab Demand in NYC

Understanding the United States Opioid Crisis

How bad is climate change? Much worse than you think

Evolution of Terrorism

Does College Provide All Students with the Same Economic Opportunities?

Diversity in NYC’s Specialized High Schools

Past years’