1 of 27

Exploratory Data Analysis: Plotting

Johns Hopkins Bloomberg School of Public Health

Instructor: Jeff Leek

Youtube: http://www.youtube.com/JHSPHAppliedStat

2 of 27

An important book

3 of 27

Exploratory data analysis

  1. Suggest hypotheses
  2. Assess assumptions
  3. Help with model selection
  4. Suggest further data collection

4 of 27

table()

  • Useful for comparing variables
  • cut2() helps with quantitative variables

5 of 27

quantile()

  • Useful for comparing distributions
  • Important parameters
    • probs

6 of 27

summary()

  • Simple version of quantile()

7 of 27

Suggestion

When performing exploratory data analysis you will often have the choice of performing a statistical test/summary or making a plot. Make the plot. This doesn't mean don't think!

Choosing how to make plots and using them to convince yourself/others that trends are real is an important skill.

8 of 27

Scale matters!

9 of 27

Correlation can be hard to interpret

r2

1-(1-r2)-1/2

10 of 27

Common scales are important

11 of 27

Judging ratios of slopes is hard

12 of 27

Position is better

13 of 27

plot(),points(),lines()

  • Really flexible
  • Important parameters in help for par()
  • Important parameters (subset)
    • ask,mar,mfrow,cex,cex.axis(etc.),col,pch,lwd,lty,las

14 of 27

qqplot()

  • Quantile-quantile plot
  • Good for comparing distributions
  • abline(c(0,1)) often a useful reference
  • Interpretation can be tricky

15 of 27

boxplot()

  • Another useful choice for comparing distributions
  • Important parameters
    • varwidth,col

16 of 27

hist()

  • Another distributional plot
  • Important parameters
    • breaks,freq,col

17 of 27

image(),heatmap()

  • 3d histograms
  • heatmap() clusters
  • image()
    • image(t(dat)[,nrow(dat):1])
  • Important parameters
    • x,y,z,zlim,col,breaks

18 of 27

matplot(),matpoints(),matlines()

  • Plots columns of matrix
  • Often useful for timecourse/longitudinal data
  • Important parameters
    • type,pch,lty,lwd, etc.

19 of 27

smoothScatter() [genefilter]

  • Good for plotting lots of points
  • See also: hexbin()
  • Important parameters
    • nbin,bandwidth,nrpoints,colramp

20 of 27

xyplot() [lattice package]

  • Totally different way of thinking about plotting
  • Similar in spirit to ggplot

21 of 27

qplot [ggplot2]

  • Similar to lattice
  • Way slicker
  • Make sure you have the latest version of R

22 of 27

d3.js

  • Javascript library
  • Interactive graphics
  • Wave of the future?

23 of 27

svgAnnotation

  • Makes interactive plots like d3.js
  • In R, but you still need to learn the syntax
  • Wave of the future?

24 of 27

dev.copy2pdf()

  • Saves plot on the screen to a pdf

25 of 27

pdf(),png(),jpeg()

  • pdf() - bigger
  • png(),jpeg() - smaller
  • Important parameters
    • height,width,file

26 of 27

Figures in papers

  1. Necessary
  2. Informative
  3. Attractive
  4. Clearly labeled (big axes,font; clear colors)
  5. Aware of color blindness
  6. Include detailed figure captions

Try not to end up on Broman's List

27 of 27

Something to aspire to