Data Visualization with

Tableau

James L. Adams

Data and Visualization Librarian

Dartmouth College

James.L.Adams@dartmouth.edu

Data

bit.ly/neasist-tableau

Slides

dartgo.org/2019-01-11-tableau-slides

Tidy Data

Tidy Data

https://en.wikipedia.org/wiki/Hadley_Wickham

Data Structure and Semantics

person

treatment_a

treatment_b

John Smith

2

Jane Doe

16

11

Mary Johnson

3

1

treatment

John_Smith

Jane_Doe

Mary_Johnson

a

16

3

b

2

11

1

Tidy Data

  • Values
  • Variables
  • Observations

Tidy Data

  • Each variable is a column
  • Each observation is a row
  • Each “cell” is a value

person

treatment

result

John Smith

a

Jane Doe

a

16

Mary Johnson

a

3

John Smith

b

2

Jane Doe

b

11

Mary Johnson

b

1

Data Structure and Semantics

person

treatment_a

treatment_b

John Smith

2

Jane Doe

16

11

Mary Johnson

3

1

treatment

John_Smith

Jane_Doe

Mary_Johnson

a

16

3

b

2

11

1

person

treatment

result

John Smith

a

Jane Doe

a

16

Mary Johnson

a

3

John Smith

b

2

Jane Doe

b

11

Mary Johnson

b

1

Not Tidy

Tidy

Think about each column as a potential axis in your visualization - “John_Smith” is not an axis.

Messy Data

  • Column headers are values, not variable names
  • Multiple variables stored in one column
  • Variables stored in both rows and columns
  • Multiple types of observational units in the same table
  • Single observational unit stored in multiple tables

Messy Data

  • Column headers are values, not variable names
  • Multiple variables stored in one column
  • Variables stored in both rows and columns
  • Multiple types of observational units in the same table
  • Single observational unit stored in multiple tables

person

treatment_a

treatment_b

John Smith

2

Jane Doe

16

11

Mary Johnson

3

1

Messy Data

  • Column headers are values, not variable names
  • Multiple variables stored in one column
  • Variables stored in both rows and columns
  • Multiple types of observational units in the same table
  • Single observational unit stored in multiple tables

penguin

location

age_sex

1

slopes

2_m

2

plain

4_f

3

slopes

3_f

Messy Data

  • Column headers are values, not variable names
  • Multiple variables stored in one column
  • Variables stored in both rows and columns
  • Multiple types of observational units in the same table
  • Single observational unit stored in multiple tables

id

name

variable

value

1

John Smith

treatment

a

1

John Smith

result

-

2

John Smith

treatment

b

2

John Smith

result

2

3

Jane Doe

treatment

a

3

Jane Doe

result

16

Messy Data

  • Column headers are values, not variable names
  • Multiple variables stored in one column
  • Variables stored in both rows and columns
  • Multiple types of observational units in the same table
  • Single observational unit stored in multiple tables

year

artist

time

track

week

rank

2000

2 Pac

4:22

Baby Don’t Cry

1

87

2000

2 Pac

4:22

Baby Don’t Cry

2

82

2000

2 Pac

4:22

Baby Don’t Cry

3

72

2000

2Ge+her

3:15

The Hardest Part of ...

1

91

2000

2Ge+her

3:15

The Hardest Part of ...

2

87

2000

2Ge+her

3:15

The Hardest Part of ...

3

92

Messy Data

  • Column headers are values, not variable names
  • Multiple variables stored in one column
  • Variables stored in both rows and columns
  • Multiple types of observational units in the same table
  • Single observational unit stored in multiple tables

person

treatment

result

John Smith

a

John Smith

b

2

person

treatment

result

Mary Johnson

a

3

Mary Johnson

b

1

person

treatment

result

Jane Doe

a

16

Jane Doe

b

11

Tidy vs. Untidy

person

treatment_a

treatment_b

John Smith

2

Jane Doe

16

11

Mary Johnson

3

1

treatment

John_Smith

Jane_Doe

Mary_Johnson

a

16

3

b

2

11

1

person

treatment

result

John Smith

a

Jane Doe

a

16

Mary Johnson

a

3

John Smith

b

2

Jane Doe

b

11

Mary Johnson

b

1

Not Tidy

Tidy

Think about each column as a potential axis in your visualization - “John_Smith” is not an axis.

Data Visualization

Effectiveness & Clarity

Effectiveness & Clarity

Effectiveness & Clarity

Effectiveness & Clarity

Effectiveness & Clarity

“Above all else show the data”

- Edward Tufte

Effectiveness & Clarity

Effectiveness & Clarity

Tableau

Tableau - Activity 1

Using the iris data and what you’ve learned so far:

  • Create a scatter plot comparing the Lengths and Widths of the Sepals
  • Change the shape of the markers to a filled circle

Tableau - Activity 2

Using the financial data and what you’ve learned so far:

  • Create a graph that can help us see which months have the most units sold
    • Try this graph as one that ignores year, and as one that includes year
      • Which one is better, and why?
  • Create a graph that shows which countries provide the most profits
    • Try breaking this out by segment

Tableau - Activity 3

Using the gapminder data and what you’ve learned so far:

  • Create an area graph that shows average life expectancy changing over time, broken out by continent
    • Is this effective? Why or why not?
    • Try this as a line graph. Is it more effective?
  • Create a map that shows GDP Per Capita for each country
    • Can you break this out year-by-year?
    • What is good and bad about this representation?

James L. Adams

Data and Visualization Librarian

Dartmouth College

James.L.Adams@dartmouth.edu

Sources

Bryan, J. (2017). gapminder: Excerpt from the Gapminder data, as an R data package and in plain text delimited form. R. Retrieved from https://github.com/jennybc/gapminder (Original work published 2014)

Fisher, R. A. (1936). The Use of Multiple Measurements in Taxonomic Problems. Annals of Eugenics, 7(2), 179–188. https://doi.org/10.1111/j.1469-1809.1936.tb02137.x

Hart, M., & Saxton, A. (n.d.). Download the Financial Sample workbook for Power BI - Power BI. Retrieved January 8, 2018, from https://docs.microsoft.com/en-us/power-bi/sample-financial-download

Interest in storage-space vs cloud [OC] • r/dataisbeautiful. (2017, September 8). Retrieved January 12, 2018, from https://www.reddit.com/r/dataisbeautiful/comments/6yty6j/interest_in_storagespace_vs_cloud_oc/

Wickham, H. (2014). Tidy Data. Journal of Statistical Software, 59(10). Retrieved from http://dx.doi.org/10.18637/jss.v059.i10

Yau, N. (2009, November 26). Fox News Makes the Best Pie Chart. Ever. Retrieved January 12, 2018, from https://flowingdata.com/2009/11/26/fox-news-makes-the-best-pie-chart-ever/

NEASIST Tableau - Google Slides