Visualization �and�Data Mining
Outline
2
Visualization Role
3
Bad Visualization: �Spreadsheet
4
2,124
2003
2,121
2002
2,120
2001
2,105
2000
2,110
1999
Sales
Year
What is wrong with this graph?
Bad Visualization: �Spreadsheet with misleading Y –axis
5
2,124
2003
2,121
2002
2,120
2001
2,105
2000
2,110
1999
Sales
Year
Y-Axis scale gives WRONG
impression of big change
Better Visualization
6
2,124
2003
2,121
2002
2,120
2001
2,105
2000
2,110
1999
Sales
Year
Axis from 0 to 2000 scale gives
correct impression of small change
Lie Factor
7
Tufte requirement: 0.95<Lie Factor<1.05
(E.R. Tufte, “The Visual Display of Quantitative Information”, 2nd edition)
Tufte’s Principles of �Graphical Excellence�
8
(E.R. Tufte, “The Visual Display of Quantitative Information”, 2nd edition)
Visualization Methods
9
1-D (Univariate) Data
10
7
5
3
1
0
20
Mean
low
high
Middle 50%
Tukey box plot
Histogram
2-D (Bivariate) Data
11
price
mileage
3-D Data (projection)
12
price
Lie Factor=14.8
13
Lie Factor=14.8
(E.R. Tufte, “The Visual Display of Quantitative Information”, 2nd edition)
3-D image �(requires 3-D blue and red glasses)
14
Taken by Mars Rover Spirit, Jan 2004
Visualizing in 4+ Dimensions
15
Multiple Views
16
Give each variable its own display
A B C D E
1 4 1 8 3 5
2 6 3 4 2 1
3 5 7 2 4 3
4 2 6 3 1 5
A B C D E
1
2
3
4
Problem: does not show correlations
Scatterplot Matrix
17
Represent each possible
pair of variables in their
own 2-D scatterplot
(car data)
Q: Useful for what?
A: linear correlations
(e.g. horsepower & weight)
Q: Misses what?
A: multivariate effects
Parallel Coordinates
18
Dataset in a Cartesian coordinates
Same dataset in parallel coordinates
Invented by
Alfred Inselberg
while at IBM, 1985
Example: Visualizing Iris Data
19
Iris setosa
Iris versicolor
Iris virginica
Flower Parts
20
Petal, a non-reproductive part of the flower
Sepal, a non-reproductive part of the flower
Parallel Coordinates
21
Sepal
Length
5.1
Parallel Coordinates: 2 D
22
Sepal
Length
5.1
Sepal
Width
3.5
Parallel Coordinates: 4 D
23
Sepal
Length
5.1
Sepal
Width
Petal
length
Petal
Width
3.5
1.4
0.2
Parallel Visualization of Iris data
24
5.1
3.5
1.4
0.2
Parallel Visualization Summary
25
Chernoff Faces
26
Encode different variables’ values in characteristics
of human face
http://www.cs.uchicago.edu/~wiseman/chernoff/
http://hesketh.com/schampeo/projects/Faces/chernoff.html
Cute applets:
Interactive Face
27
Chernoff faces, example
28
Stick Figures
29
Stick figures, example
30
census data showing
age, income, sex,
education, etc.
Closed figures correspond to women and we can see more of them on the left.
Note also a young woman with high income
Visualization software
Free and Open-source
31
Visualization Summary
32