1 of 32

Visualization �and�Data Mining

2 of 32

Outline

  • Graphical excellence and lie factor
  • Representing data in 1,2, and 3-D
  • Representing data in 4+ dimensions
    • Parallel coordinates
    • Scatterplots
    • Stick figures

2

3 of 32

Visualization Role

  • Support interactive exploration
  • Help in result presentation
  • Disadvantage: requires human eyes
  • Can be misleading

3

4 of 32

Bad Visualization: �Spreadsheet

4

2,124

2003

2,121

2002

2,120

2001

2,105

2000

2,110

1999

Sales

Year

What is wrong with this graph?

5 of 32

Bad Visualization: �Spreadsheet with misleading Y –axis

5

2,124

2003

2,121

2002

2,120

2001

2,105

2000

2,110

1999

Sales

Year

Y-Axis scale gives WRONG

impression of big change

6 of 32

Better Visualization

6

2,124

2003

2,121

2002

2,120

2001

2,105

2000

2,110

1999

Sales

Year

Axis from 0 to 2000 scale gives

correct impression of small change

7 of 32

Lie Factor

7

Tufte requirement: 0.95<Lie Factor<1.05

(E.R. Tufte, “The Visual Display of Quantitative Information”, 2nd edition)

8 of 32

Tufte’s Principles of �Graphical Excellence�

  • Give the viewer
    • the greatest number of ideas
    • in the shortest time
    • with the least ink in the smallest space.

  • Tell the truth about the data!

8

(E.R. Tufte, “The Visual Display of Quantitative Information”, 2nd edition)

9 of 32

Visualization Methods

  • Visualizing in 1-D, 2-D and 3-D
    • well-known visualization methods
  • Visualizing more dimensions
    • Parallel Coordinates
    • Other ideas

9

10 of 32

1-D (Univariate) Data

  • Representations

10

7

5

3

1

0

20

Mean

low

high

Middle 50%

Tukey box plot

Histogram

11 of 32

2-D (Bivariate) Data

  • Scatter plot, …

11

price

mileage

12 of 32

3-D Data (projection)

12

price

13 of 32

Lie Factor=14.8

13

Lie Factor=14.8

(E.R. Tufte, “The Visual Display of Quantitative Information”, 2nd edition)

14 of 32

3-D image �(requires 3-D blue and red glasses)

14

Taken by Mars Rover Spirit, Jan 2004

15 of 32

Visualizing in 4+ Dimensions

  • Scatterplots
  • Parallel Coordinates
  • Chernoff faces
  • Stick Figures

15

16 of 32

Multiple Views

16

Give each variable its own display

A B C D E

1 4 1 8 3 5

2 6 3 4 2 1

3 5 7 2 4 3

4 2 6 3 1 5

A B C D E

1

2

3

4

Problem: does not show correlations

17 of 32

Scatterplot Matrix

17

Represent each possible

pair of variables in their

own 2-D scatterplot

(car data)

Q: Useful for what?

A: linear correlations

(e.g. horsepower & weight)

Q: Misses what?

A: multivariate effects

18 of 32

Parallel Coordinates

18

  • Encode variables along a horizontal row
  • Vertical line specifies values

Dataset in a Cartesian coordinates

Same dataset in parallel coordinates

Invented by

Alfred Inselberg

while at IBM, 1985

19 of 32

Example: Visualizing Iris Data

19

Iris setosa

Iris versicolor

Iris virginica

20 of 32

Flower Parts

20

Petal, a non-reproductive part of the flower

Sepal, a non-reproductive part of the flower

21 of 32

Parallel Coordinates

21

Sepal

Length

5.1

22 of 32

Parallel Coordinates: 2 D

22

Sepal

Length

5.1

Sepal

Width

3.5

23 of 32

Parallel Coordinates: 4 D

23

Sepal

Length

5.1

Sepal

Width

Petal

length

Petal

Width

3.5

1.4

0.2

24 of 32

Parallel Visualization of Iris data

24

5.1

3.5

1.4

0.2

25 of 32

Parallel Visualization Summary

  • Each data point is a line
  • Similar points correspond to similar lines
  • Lines crossing over correspond to negatively correlated attributes
  • Interactive exploration and clustering

  • Problems: order of axes, limit to ~20 dimensions

25

26 of 32

Chernoff Faces

26

Encode different variables’ values in characteristics

of human face

http://www.cs.uchicago.edu/~wiseman/chernoff/

http://hesketh.com/schampeo/projects/Faces/chernoff.html

Cute applets:

27 of 32

Interactive Face

27

28 of 32

Chernoff faces, example

28

29 of 32

Stick Figures

  • Two variables are mapped to X, Y axes
  • Other variables are mapped to limb lengths and angles
  • Texture patterns can show data characteristics

29

30 of 32

Stick figures, example

30

census data showing

age, income, sex,

education, etc.

Closed figures correspond to women and we can see more of them on the left.

Note also a young woman with high income

31 of 32

Visualization software

Free and Open-source

  • Ggobi
  • Xmdv

  • Many more - see www.KDnuggets.com/software/visualization.html

31

32 of 32

Visualization Summary

  • Many methods
  • Visualization is possible in more than 3-D
  • Aim for graphical excellence

32