1 of 124

Comparing means among groups

Working with continuous data: One-way ANOVAs and multiple comparisons, introducing the linear model, t-test connections, and non-parametric options

J. Stephen Gosnell

Baruch College

1

2 of 124

Goals

  • Extend tests of continuous data to comparing groups
    • ANOVA as first linear model
    • more multiple comparisons
  • Historical connections: t-tests
  • Non-parametric options

2

3 of 124

Frequency Distributions Or Histograms

  • A frequency distribution or histogram can be used to graphically analyze variability
  • Show skewness and shape
  • Fisher’s famous sepal length dataset
  • How often did each value occur in a sample?

3

These graphs look simple because they are ones YOU can replicate in R. See class files!

From earlier lecture!

4 of 124

4

Iris virginica, Frank Mayfield, CC BY-SA 2.0 <https://creativecommons.org/licenses/by-sa/2.0>, via Wikimedia Commons

Flower morphology. Pearson Scott Foresman, Public domain, via Wikimedia Commons

5 of 124

Transition to Hypothesis Testing

  • What is a hypothesis we might want to test?

5

These graphs look simple because they are ones YOU can replicate in R. See class files!

6 of 124

Transition to Hypothesis Testing

  • What is a hypothesis we might want to test?

6

These graphs look simple because they are ones YOU can replicate in R. See class files!

7 of 124

What about multiple groups?

  • What do we test now (and how?)

7

8 of 124

What about multiple groups?

  • What do we test now (and how?)

8

9 of 124

What about multiple groups?

  • What do we test now (and how?)

9

10 of 124

What about multiple groups?

  • What do we test now (and how?)
  • We still focus on distributions!
    • Asking if the data are “equal” to something is not helpful!

10

11 of 124

What about multiple groups?

  • What do we test now (and how?)
  • We still focus on distributions!
    • Asking if the data are “equal” to something is not helpful!
    • focus on means!

11

12 of 124

Wide vs long aside

  • Note we have one “unit” per row
    • may have multiple measurements!

13 of 124

13

14 of 124

Explaining the data

  • Group means + noise
  • Overall average + noise

14

15 of 124

Null Hypothesis, Visualized

  • Sample from a single population and assign to each group
  • Under the null the groups means would be exactly on the grand mean, minus sampling error

15

16 of 124

Testing the Null Hypothesis

  • What is the test statistic?
    • 3+ groups, so we can‘t just subtract
  • Instead, we can calculate variance within and among groups and compare
  • Both would give same estimate σ2
  • So we can calculate both and make a ratio

16

17 of 124

Variance Estimated Within Each Group: Mean Square Error (MSE)

17

18 of 124

Variance Estimated Among Each Group

  • multiply by n to get estimate for s2
  • mean square treatment
    • MST

18

19 of 124

We Can Approximate The Distribution Under The Null Hypothesis Using The F Distribution And Error Estimates

  • If you divide these two estimates of variance when H0 is true, you should be close to 1
    • In other words, variance among groups = variance within groups

19

20 of 124

What are the ways we can find p-values?

20

21 of 124

P-value Via Simulation

  • Ratio is at >119 in our data!

21

22 of 124

But this requires sampling

  • and is slow
  • Given our earlier calculations, it turns out we can approximate this distribution using a ratio of χ2 variables
    • remember those?
      • a standard Z squared!
  • but now, numerator has df #groups -1 and denominator df is n- # of groups -1
    • F ratio!

22

23 of 124

P-value Via Distribution

  • Ratio is at >119 in our data!

23

24 of 124

Sums Of Squares Notes: Divide Variance Among Parts Of The Model

  • Big idea is we can break up the total sum of squares into parts!
    • Sum of total squared distance from grand mean squared (for each point) is equal to distance of each point from group mean squared + distance of each group mean to grand mean squared
    • These add up
  • We then divide these by degrees of freedom/parts to get mean sums of squares
    • not additive!

24

Is there more variance among or within groups?

25 of 124

Sums Of Squares Notes: Divide Variance Among Parts Of The Model

  • We looked at difference between estimating variance within and among groups under the null
  • Realize we can always break the data points into these components
    • deviation between observation and the mean of its group (called “error”)
    • deviation between observation’s group mean and the grand mean
    • These add up to total error (deviation from grand mean!)

25

Is there more variance among or within groups?

26 of 124

Welcome to the linear model

  • First, make the model (minus intercept here) and check assumptions
  • Notice the formula interface and need to save to an object
  • Residuals/errors are basis for assumptions
  • Visual inspection

26

27 of 124

27

28 of 124

28

29 of 124

Remember q-q plots!

https://rpubs.com/mbh038/725314

29

30 of 124

30

31 of 124

31

32 of 124

Visually Checking Assumptions

  • 4 plots from R
  • Note all red lines are fairly flat and variance appears similar around each group in Residuals vs Fitted plot
  • Note dots in qq-plot are close to line, which indicates normality

32

  • E are identically distributed
  • E follow a normal distribution
  • No outliers
  • Bands are ok! Why?

33 of 124

Assumptions

  • Explaining 4 plots from R
    • Check residuals vs fitted for any structure
    • Q-q plot should show basic normality
    • Scale-location plot is another way of looking at residuals vs fitted; check for increase with fitted values
    • Residuals vs leverage shows how far a point is from fitted points vs how much influence it has (leverage).
      • Cook's Distance compares the fitted response of the regression which uses every data point, against the fitted response of the regression where a particular data point has been dropped from the analysis (and then sums this difference across all data points).
      • Very influential data points (on the parameter estimates) are identified, and are labeled in this plot.
        • important if they fall outside dashed red Cook Distance marker (upper or lower right corners), but you may not even see this!

33

34 of 124

Check Model Outcomes

  • Note R sets first group (alphabetically) as intercept for others
  • This is what you want to use (model with intercept)!

34

35 of 124

Problems with LM approach

  • We get a p-value for each level of the variable, but not an overall p-value
    • we could use model level for now, but not long-term/consistent answer
  • But other info here is useful!

35

36 of 124

Other lm Benefits

  • Full p-value for model is also available
    • Same as variable level p-value fit for one-way ANOVA!

36

37 of 124

Other benefits of LM: P-value Isn’t Everything!

  • You may have significant differences among groups; but how much variance does it explain?
    • R2 answers this
  • R2 = SSgroups/SStotal
  • Relies on partitioning of variance that we noted earlier!
  • •If R2 were near 0, all of the variability is within groups (rather than among) and the group means are all similar•

37

Must use model with intercept! Otherwise R sets the overall mean to 0 for the null model and inflates F and R2 greatly!

38 of 124

Can Be Used To Compare Models As Well….

  • But we prefer smaller models
  • The adjusted R2 penalizes larger models so you can compare them
    • More on this later!
  • For single model analysis/description, just use the multiple R-squared value

38

39 of 124

Model without Intercept Issues

  • Compare to model without intercept
  • Provides group means, but otherwise causes error for follow-up analysis
    • Overall F and R2 incorrect as it compares to full null model ( zero intercept!)
  • Can also get groups means from summarySE function to avoid this

39

40 of 124

Problems with LM approach

  • We get a p-value for each level of the variable, but not an overall p-value
    • we could use model level for now, but not long-term/consistent answer

40

41 of 124

How to Build Anova Tables

  • Traditional way of viewing outcomes
  • Provides overall p-value for if your treatment was significant
    • Same as overall model fit for one-way ANOVA!
  • In the traditional table, note SS add up
  • Make sure fit for model with intercept!

41

Reporting:

F2,147=119.26, p < .001; reject null hypothesis

42 of 124

What Is Type “III”?

  • Partitioning variance is key to model comparison overall
    • How much is explained by null model?
    • How much “extra” is explained by larger model?
    • These questions can be answered by partitioning variance among different nested models and comparing fit using an F distribution
  • In other words, the big picture of linear model analysis is we can determine what adding a particular factor into a model does given other factors are already included in the model
  • We do this by considering sums of squares, but these can be calculated in multiple ways
  • For one-way ANOVA (what we are doing), they are all the same, but later on it matters
  • We’ll come back to this when we need it, but good tip is just to get in habit of specifying type III
    • and maybe something about contrasts

42

43 of 124

Post-hoc Tests

  • If we have a significant p-value from the Anova, we need to see what caused that to happen
  • For tests with more than 2 groups, any difference can invalidate null hypothesis?

43

44 of 124

Post-hoc Tests

  • If we have a significant p-value from the Anova, we need to see what caused that to happen
  • For tests with more than 2 groups, any difference can invalidate null hypothesis?
  • Which one?
    • Running individual tests will inflate type 1 error rate!
      • Every test has a 5% chance of being wrong if α=.05
      • So multiple tests inflate these, independent or not
    • Answer?
      • Control for the family-wise error rate by adjusting α

44

45 of 124

https://xkcd.com/882

45

46 of 124

https://xkcd.com/882

46

47 of 124

https://xkcd.com/882

47

48 of 124

https://xkcd.com/882

48

49 of 124

Options (all seen before!)

  • Bonferroni
    • Divide alpha by total number of tests you need to complete
    • Each test must meet this new alpha level to be “significant”
    • For this problem, there are a lot of comparisons?
      • 10! Let’s list them!
    • So the new α is .05/10 = .05
      • Small!
  • Adjusted/Sequential Bonferroni (Holm’s in R)
    • Rank post-hoc tests by p-value, smallest to largest
    • For each p-value, starting with smallest, compare to alpha/(number of remaining tests)
  • FDR
    • Maximize α
    • Rank post-hoc tests by p-value, smallest to largest
    • For each p-value, calculate a q-value ((number of true tests/rank of p-value)*p-value) and compare to alpha
  • Many more!

49

50 of 124

Multcomp Lets Us Do Post Hoc Tests With Lm And Extensions

  • Tukey’s means to compare all means
    • not exactly TukeyHSD
  • glht command just needs model and to specify which variable you are interested in
    • “Species” here
    • note need to save as object
  • Reject all null hypotheses under your original α level

50

51 of 124

51

52 of 124

52

53 of 124

53

54 of 124

54

55 of 124

Options

  • Bonferroni
    • Divide alpha by total number of tests you need to complete
    • Each test must meet this new alpha level to be “significant”
    • For this problem, there are a lot of comparisons?
      • 10! Let’s list them!
    • So the new α is .05/3 = .0167
      • 3 is number of comparisons we are making!
      • Small!
  • Adjusted/Sequential Bonferroni (Holm’s in R)
    • Rank post-hoc tests by p-value, smallest to largest
    • Apportion α as needed to meet these until you run out
  • FDR
    • Maximize α
    • Rank post-hoc tests by p-value, smallest to largest
    • Find “largest” p-value you can accept; all before (smaller than) that are significant!
  • Many more!

55

56 of 124

Multcomp Lets Us Do Post Hoc Tests With Lm And Extensions

  • Can also specify contrasts

56

57 of 124

Multcomp Lets Us Do Post Hoc Tests With Lm And Extensions

  • Can also specify contrasts
  • Set control method

57

58 of 124

Multcomp Lets Us Do Post Hoc Tests With Lm And Extensions

  • Can also specify contrasts
  • Set control method

58

59 of 124

A Little Linear Algebra Goes A Long Way

  • The general linear model! This is what unites t-tests, anovas, ancovas, regression, and can be extended in a number of ways!

59

60 of 124

A Little Linear Algebra Goes A Long Way

  • Y = XB + E
  • Y = response
    • N x 1 matrix
  • X = explanatory variables
    • N x #of explanatory variables
  • B = matrix of responses
    • Think of as slopes (numerical) or adjustments for groups (factors)
    • # of explanatory variables x 1
  • E is our error (nx1) matrix

60

  • Remember, multiply row by columns to get an answer for matrices
    • Matrices must match dimensions to multiply!
  • Build the linear model for our iris example!�

61 of 124

Model/design matrix

61

62 of 124

Coefficient/Beta matrix

62

63 of 124

Residuals

63

64 of 124

Why Do This?

  • Explains
    • Singularity messages
      • When a column is linear combination of others
      • connected to non-invertible matrix
        • we don’t divide matrices, instead we multiply by inverse
          • So to solve for B, we use (X’X)X’B = y
        • But determinant doesn’t exist for non-independent column, so no inverse since X’ is d*[moved matrix)
    • Similar logic eventually used to find leverage values via H (hat) matrix

64

https://www.mathsisfun.com/algebra/matrix-inverse.html

65 of 124

Why Do This?

    • Degrees of freedom
      • how many parameters did you estimate?
      • equal to columns of X matrix (or rows of B matrix)
    • similarity among tests
      • Shared assumptions
        • Linear relationship among the response and predictor variables
        • Errors are identically and independently distributed and follow a normal distribution
          • NOTE THIS IS THE ERRORS/RESIDUALS, NOT THE MEANS!
    • Allows contrasts
      • Multiple comparisons actually test for differences among Betas!

65

Errors iid ~ N(0, σ2) is all you need!

66 of 124

Back To Linear Algebra: Orthogonal Contrasts Don’t Need Adjusting

  • User-defined contrasts may be useful for setting orthogonal contrasts
  • Orthogonal contrasts are ones that are independent, meaning you can’t form them from other columns in the X portion of the matrix
    • sum of multiplied coefficients is 0
    • k-1 orthogonal contrasts exist
  • Means you don’t have to correct for FWER
  • Can be specified in multcomp and then not corrected
      • If you combine groups, use fractions if you want estimates of differences
      • must add to 0
        • Makes no difference for p-values

66

67 of 124

Orthogonal Contrasts

  • Sum of product of contrast coefficients sum to is 0
  • Everything compared once!

67

68 of 124

Orthogonal Contrasts

  • Sum of product of contrast coefficients sum to is 0
  • Everything compared once!

68

69 of 124

Orthogonal Contrasts

  • Sum of product of contrast coefficients sum to is 0
  • Everything compared once!
    • set totals to 1 or -1 for appropriate estimates

69

70 of 124

Graphical Representation Of Results

  • Groups which cannot be distinguished share the same letter
  • Done by adding column to output from summarySE function

70

71 of 124

Graphical Representation Of Results

  • Groups which cannot be distinguished share the same letter
  • Done by adding column to output from summarySE function

71

72 of 124

Other options

72

73 of 124

t-test connections

  • Often used as bridge from 1 to many samples
    • easy statistics

73

  • Let D = the difference between the mean of the first group and the mean of the second group

74 of 124

Deriving The Test Statistics For 2 Samples

  • We have to estimate noise again
  • Null hypothesis is that they come from a single population, so draw from one urn
  • Variance estimate still does not matter!

74

51

56

75 of 124

t-test connections

  • 2-sample t-test is special case of ANOVA
  • since numerator is 1, only report denominator df

75

76 of 124

t-test connections

  • 2-sample t-test is special case of ANOVA
  • since numerator is 1, only report denominator df

76

77 of 124

Differences: Estimate Of Pooled Variance

  • If we assume both populations have the same variance, we can can simply weight our estimates for the variance
    • Assumes that larger sample yields better estimate, so weight it more
  • Still controls for degrees of freedom

77

This simplifies to

Pooled variance formula

78 of 124

Differences: Estimate Of Pooled Variance

  • However, we often don’t know this, and using overall mean of variance is highly biased if H0 is false
  • Instead, we calculate each individual estimate and weight them
  • Known as the Welsh or Behren-Fisher’s t-test
    • Leads to an approximate following of the t-distribution
    • Degrees of freedom can be non-integer (decimal) and less than n1 + n2 - 2

78

Corrected variance formula – default used in R

Remember:

79 of 124

Degrees Of Freedom Odd For Unbalanced Design!

Cavity data

General formula (Welch modification for df)

79

Degrees of freedom can be non-integer (decimal) and less than n1 + n2 - 2

80 of 124

Welch/Behren-Fisher t-test

80

81 of 124

The Signal/Noise Ratio

  • We usually assume no difference among means, but we can test any shift!
  • Note our graph shows +2.213
    • just reversed the means

81

82 of 124

What If We Violate Normality Assumption?

  • Wilcoxon/Mann-Whitney U test
    • compares the central tendencies of two groups using ranks.
    • Assumes distributions are the same shape
  • Sign test
    • not option for unpaired data
  • Bootstrapping
    • Needs large enough sample size
  • Permutation
    • Needs large enough sample size
    • not option for paired data

82

83 of 124

Performing A Mann-Whitney U Test

  • First, rank all individuals from both groups together in order (for example, smallest to largest)
  • Sum the ranks for all individuals in one of the groups
    • R1 or R2
  • Calculate the test statistics, U
    • U1 is the number of times an individual from pop. 1 has a lower rank than an individual from pop. 2, out of all pairwise comparisons. (How many pairwise comparisons are possible?)
    • Use larger U value

83

U2 = n1n2 – U1

84 of 124

Assumptions Of Mann-Whitney U Test

  • Both samples are random samples
  • Both populations have the same shape of distribution
  • Mann-Whitney test is quite sensitive to data violating this 2nd assumption

84

85 of 124

85

86 of 124

86

87 of 124

87

88 of 124

88

89 of 124

89

90 of 124

Example: Garter Snake Resistance To Newt Toxin

90

Rough skinned newt

By Don Loarie (Rough-skinned Newt Taricha granulosa) [CC BY 2.0 (https://creativecommons.org/licenses/by/2.0)], via Wikimedia Commons

Photo by Jessica Bolser/USFWS.

91 of 124

Comparing Snake Resistance To Ttx (Tetradotoxin)

Resistance is known to not be normally distributed within populations

91

Locality

Proportion of snakes resistant

Benton

0.29

Benton

0.77

Benton

0.96

Benton

0.64

Benton

0.70

Benton

0.99

Benton

0.34

Warrenton

0.17

Warrenton

0.28

Warrenton

0.20

Warrenton

0.20

Warrenton

0.37

Geffeney, S., E.D. Brodie, Jr., P.C. Ruben, and E.D. Brodie III. 2002. Mechanisms of adaptation in a predator-prey arms race: TTX-resistant sodium channels. Science 297: 1336-1339.

92 of 124

Hypotheses

  • H0: The TTX resistance for snakes from Benton is the same as for snakes from Warrenton
  • HA: The TTX resistance for snakes from Benton is different from snakes from Warrenton

92

93 of 124

Calculating The Ranks

  • Rank sum for Warrenton:
    • R = 1+4+2.5+2.5+7 = 17

93

Locality

Proportion of snakes resistant

Rank

Benton

0.29

5

Benton

0.77

10

Benton

0.96

11

Benton

0.64

8

Benton

0.70

9

Benton

0.99

12

Benton

0.34

6

Warrenton

0.17

1

Warrenton

0.28

4

Warrenton

0.20

2.5

Warrenton

0.20

2.5

Warrenton

0.37

7

94 of 124

Calculating U1 And U2

94

U2 = n1n2 – U1 = 5(7) – 33 = 2

  • For a two-tailed test, we choose the larger of U1 or U2: U = 33 and compare to critical U value (determined by sample sizes)
    • Large-sample approximation to normal exists (but not necessary with computers!)

95 of 124

Compare U To The U Table

  • Critical value for U for n1 = 5 and n2 = 7 is 30
  • 33 > 30, so we can reject the null hypothesis
  • Snakes from Benton have a different distribution of resistance to TTX than the Warrenton snakes

95

96 of 124

How To Deal With Ties In Rankings

  • Determine the ranks that the values would have gotten if they were slightly different
  • Average these ranks, and assign that average to each tied individual
  • Count all those individuals when deciding the rank of the next largest individual

96

Group

Y

Rank

2

12

1

2

14

2

1

17

3

1

19

4.5

2

19

4.5

1

24

6

2

27

7

1

28

8

97 of 124

In R

  • Same format as single-sample tests
  • Formula notation used here

97

Reporting:

W = 526, p < .01; reject null hypothesis

98 of 124

Nonparametric Version Of ANOVA

  • Kruskal-Wallis test
  • Uses the ranks of the data points (rather than their magnitudes)
    • Best for ranked variables
  • Assumes:
    • Group samples are randomly sampled from populations
    • Distribution of variable must have same shape for all groups
      • Data still must be fairly homoscedastic (have same variance/shape) but doesn’t need to be normal
  • Does not use correction!

98

99 of 124

In R

  • Same format as single-sample tests
  • Formula notation used here

99

100 of 124

In R

  • Same format as single-sample tests
  • Formula notation used here

100

101 of 124

In R

  • Post-hoc tests

101

102 of 124

Bootstrapping Option

  • If we can’t assume anything, bootstrapping is still an option
    • Deals with heteroscedastic data
      • Means groups can have different variances
    • Useful if your initial plot of assumptions show a “funnel” shape where variance increases with fitted value
      • Another option is to log transform the data
  • May use “trimmed” data
    • Remove top and bottom x% to minimize impact of outliers
    • In WRS2 package, not shown here

102

103 of 124

In R

  • Using MKinfer package for ease
  • Many other options!

103

104 of 124

In R

  • Using MKinfer package for ease
  • Many other options!

104

105 of 124

In R

  • t1waybt for 3+ groups
  • Many other options!

105

106 of 124

In R

  • t1waybt for 3+ groups
  • Many other options!

106

107 of 124

Permutation is Same Idea, But You Sample Without Replacement and Combine the Data!

  • Remember permutations?
  • Logically, if groups don’t matter, you can just re-assign groups randomly among the data and calculate a signal
    • These should really be called (and sometimes are) combination or randomization tests
  • Do this lots of time to consider sampling distribution (noise)
  • Compare to your signal to get a p-value
  • Can be “exact” if you carry out every permutation!

107

108 of 124

Permutation Test in R

  • Use the coin package

108

109 of 124

Permutation Test in R

  • Use the coin package

109

110 of 124

Permutation Test in R

  • Use the coin package

110

111 of 124

Now That You’ve Seen These, Consider What They Require

  • Permutation tests also requires that you assume data come from a similar distribution
    • Same as Mann-Whitney U test!
  • Bootstrapping only assumes independent data!

111

112 of 124

Summary

  • ANOVA allows for comparisons of means among groups
    • Significant finding requires post-hoc testing for follow-up
    • Main assumptions are based on residuals
  • Special case of linear model
  • Generalized form of t-test
  • Bootstrapping and Kruskal-Wallis test are options for data that don’t meet these assumptions
  • We’ll return to paired designs next week

112

113 of 124

Notes On Actually Doing These

  • Plots at beginning of this lecture show several ways to visualize data
    • Note you can add letters to identify specific differences after post-hoc tests using included code as well
  • To get results
    • Start with a linear model (lm) unless data is highly skewed
      • Initial plots of the data are good but only identify potential outliers; remember all the assumptions are based on residuals
      • Can also use summarySE command to get initial look at variance (sd) among groups to check for similarity
    • After creating an object with the lm command, plot it to check assumptions
      • If it passes, look at overall F-value and do post-hoc tests if needed
        • Make sure you note correction method used for post-hoc tests to control FWER!
      • If not, use bootstrapping techniques if sample size is large or Kruskal-Wallis test if you are dealing with ranks (or prefer) and then proceed

113

114 of 124

What To Report

  • Method used and why
    • ANOVA (lm), Kruskal-Wallis, Bootstrapping
  • Overall p-value
    • Usually denoted
      • Fdf groups, df error = observed p-value for lm
      • Chi-squared statistic, df, and p-value for Kruskal-Wallis
      • Test statistic and number of iterations for bootstrapping, along with p-value
  • Multiple-comparison methods and results if needed
    • Make sure you note correction method used for post-hoc tests to control FWER!
    • Generally provide table with adjusted p-values for each comparison
  • Generally accompanied by graph showing error bars around groups, sometimes with letters or other symbols to denote significant differences and numerical summaries to indicate amplitude of differences
  • Including R2 value is also useful in explaining how “important” your variable is

114

115 of 124

R Aside: Working With Large Data

  • Data can be sent to you in multiple ways
    • We’ve covered inputting by hand or from csv from the web
    • You’ll see a .txt file used in assignment today

115

116 of 124

R Aside: Working With Large Data

  • Data can be sent to you in multiple formats
    • Long vs wide
  • Today you’ll see both, but its good to know how to redo

116

Long

Wide

117 of 124

reshape2 Package Is Useful

  • dcast the data into a wide format
    • dcast vs acast gives you data frame
  • melt into a long format
  • recast combines but can be tricky…

117

Long to wide: get the data

118 of 124

reshape2 Package Is Useful

  • dcast the data into a wide format
    • dcast vs acast gives you data frame
  • melt into a long format
  • recast combines but can be tricky…

118

Long to wide: dcast; formula is row ~ columns, value.var is what you fill dataframe with

Note this only works when you have a unique identifier

119 of 124

reshape2 Package Is Useful

  • dcast the data into a wide format
    • dcast vs acast gives you data frame
  • melt into a long format
  • recast combines but can be tricky…

119

Long to wide: if data isn’t unique, you have to use fun.aggregate argument to tell it what to do�

120 of 124

reshape2 Package Is Useful

  • dcast the data into a wide format
    • dcast vs acast gives you data frame
  • melt into a long format
  • recast combines but can be tricky…

120

Long to wide: can name output column by putting in quotes in formula�

121 of 124

reshape2 Package Is Useful

  • dcast the data into a wide format
    • dcast vs acast gives you data frame
  • melt into a long format
  • recast combines but can be tricky…

121

Wide to long: get the data�

122 of 124

reshape2 Package Is Useful

  • dcast the data into a wide format
    • dcast vs acast gives you data frame
  • melt into a long format
  • recast combines but can be tricky…

122

Wide to long: melt it

123 of 124

reshape2 Package Is Useful

  • dcast the data into a wide format
    • dcast vs acast gives you data frame
  • melt into a long format
  • recast combines but can be tricky…

123

Wide to long: name outcomes

124 of 124

reshape2 Package Is Useful

  • dcast the data into a wide format
    • dcast vs acast gives you data frame
  • melt into a long format
  • recast combines but can be tricky…

124

Wide to long: more independent variables

to