Comparing means among groups
Working with continuous data: One-way ANOVAs and multiple comparisons, introducing the linear model, t-test connections, and non-parametric options
J. Stephen Gosnell
Baruch College
1
Goals
2
Frequency Distributions Or Histograms
3
These graphs look simple because they are ones YOU can replicate in R. See class files!
From earlier lecture!
4
Iris virginica, Frank Mayfield, CC BY-SA 2.0 <https://creativecommons.org/licenses/by-sa/2.0>, via Wikimedia Commons
Flower morphology. Pearson Scott Foresman, Public domain, via Wikimedia Commons
Transition to Hypothesis Testing
5
These graphs look simple because they are ones YOU can replicate in R. See class files!
Transition to Hypothesis Testing
6
These graphs look simple because they are ones YOU can replicate in R. See class files!
What about multiple groups?
7
What about multiple groups?
8
What about multiple groups?
9
What about multiple groups?
10
What about multiple groups?
11
Wide vs long aside
13
Explaining the data
14
Null Hypothesis, Visualized
15
Testing the Null Hypothesis
16
Variance Estimated Within Each Group: Mean Square Error (MSE)
17
Variance Estimated Among Each Group
18
We Can Approximate The Distribution Under The Null Hypothesis Using The F Distribution And Error Estimates
19
What are the ways we can find p-values?
20
P-value Via Simulation
21
But this requires sampling
22
P-value Via Distribution
23
Sums Of Squares Notes: Divide Variance Among Parts Of The Model
24
Is there more variance among or within groups?
Sums Of Squares Notes: Divide Variance Among Parts Of The Model
25
Is there more variance among or within groups?
Welcome to the linear model
26
27
28
Remember q-q plots!
https://rpubs.com/mbh038/725314
29
30
31
Visually Checking Assumptions
32
Assumptions
33
Check Model Outcomes
34
Problems with LM approach
35
Other lm Benefits
36
Other benefits of LM: P-value Isn’t Everything!
37
Must use model with intercept! Otherwise R sets the overall mean to 0 for the null model and inflates F and R2 greatly!
Can Be Used To Compare Models As Well….
38
Model without Intercept Issues
39
Problems with LM approach
40
How to Build Anova Tables
41
Reporting:
F2,147=119.26, p < .001; reject null hypothesis
What Is Type “III”?
42
Post-hoc Tests
43
Post-hoc Tests
44
https://xkcd.com/882
45
https://xkcd.com/882
46
https://xkcd.com/882
47
https://xkcd.com/882
48
Options (all seen before!)
49
Multcomp Lets Us Do Post Hoc Tests With Lm And Extensions
50
51
52
53
54
Options
55
Multcomp Lets Us Do Post Hoc Tests With Lm And Extensions
56
Multcomp Lets Us Do Post Hoc Tests With Lm And Extensions
57
Multcomp Lets Us Do Post Hoc Tests With Lm And Extensions
58
A Little Linear Algebra Goes A Long Way
59
A Little Linear Algebra Goes A Long Way
60
Model/design matrix
61
Coefficient/Beta matrix
62
Residuals
63
Why Do This?
64
https://www.mathsisfun.com/algebra/matrix-inverse.html
Why Do This?
65
Errors iid ~ N(0, σ2) is all you need!
Back To Linear Algebra: Orthogonal Contrasts Don’t Need Adjusting
66
Orthogonal Contrasts
67
Orthogonal Contrasts
68
Orthogonal Contrasts
69
Graphical Representation Of Results
70
Graphical Representation Of Results
71
Other options
72
t-test connections
73
Deriving The Test Statistics For 2 Samples
74
51
56
t-test connections
75
t-test connections
76
Differences: Estimate Of Pooled Variance
77
This simplifies to
Pooled variance formula
Differences: Estimate Of Pooled Variance
78
Corrected variance formula – default used in R
Remember:
Degrees Of Freedom Odd For Unbalanced Design!
Cavity data
General formula (Welch modification for df)
79
Degrees of freedom can be non-integer (decimal) and less than n1 + n2 - 2
Welch/Behren-Fisher t-test
80
The Signal/Noise Ratio
81
What If We Violate Normality Assumption?
82
Performing A Mann-Whitney U Test
83
U2 = n1n2 – U1
Assumptions Of Mann-Whitney U Test
84
85
86
87
88
89
Example: Garter Snake Resistance To Newt Toxin
90
Rough skinned newt
By Don Loarie (Rough-skinned Newt Taricha granulosa) [CC BY 2.0 (https://creativecommons.org/licenses/by/2.0)], via Wikimedia Commons
Photo by Jessica Bolser/USFWS.
Comparing Snake Resistance To Ttx (Tetradotoxin)
Resistance is known to not be normally distributed within populations
91
Locality | Proportion of snakes resistant |
Benton | 0.29 |
Benton | 0.77 |
Benton | 0.96 |
Benton | 0.64 |
Benton | 0.70 |
Benton | 0.99 |
Benton | 0.34 |
Warrenton | 0.17 |
Warrenton | 0.28 |
Warrenton | 0.20 |
Warrenton | 0.20 |
Warrenton | 0.37 |
Geffeney, S., E.D. Brodie, Jr., P.C. Ruben, and E.D. Brodie III. 2002. Mechanisms of adaptation in a predator-prey arms race: TTX-resistant sodium channels. Science 297: 1336-1339.
Hypotheses
92
Calculating The Ranks
93
Locality | Proportion of snakes resistant | Rank |
Benton | 0.29 | 5 |
Benton | 0.77 | 10 |
Benton | 0.96 | 11 |
Benton | 0.64 | 8 |
Benton | 0.70 | 9 |
Benton | 0.99 | 12 |
Benton | 0.34 | 6 |
Warrenton | 0.17 | 1 |
Warrenton | 0.28 | 4 |
Warrenton | 0.20 | 2.5 |
Warrenton | 0.20 | 2.5 |
Warrenton | 0.37 | 7 |
Calculating U1 And U2
94
U2 = n1n2 – U1 = 5(7) – 33 = 2
Compare U To The U Table
95
How To Deal With Ties In Rankings
96
Group | Y | Rank |
2 | 12 | 1 |
2 | 14 | 2 |
1 | 17 | 3 |
1 | 19 | 4.5 |
2 | 19 | 4.5 |
1 | 24 | 6 |
2 | 27 | 7 |
1 | 28 | 8 |
In R
97
Reporting:
W = 526, p < .01; reject null hypothesis
Nonparametric Version Of ANOVA
98
In R
99
In R
100
In R
101
Bootstrapping Option
102
In R
103
In R
104
In R
105
In R
106
Permutation is Same Idea, But You Sample Without Replacement and Combine the Data!
107
Permutation Test in R
108
Permutation Test in R
109
Permutation Test in R
110
Now That You’ve Seen These, Consider What They Require
111
Summary
112
Notes On Actually Doing These
113
What To Report
114
R Aside: Working With Large Data
115
R Aside: Working With Large Data
116
Long
Wide
reshape2 Package Is Useful
117
Long to wide: get the data
reshape2 Package Is Useful
118
Long to wide: dcast; formula is row ~ columns, value.var is what you fill dataframe with
Note this only works when you have a unique identifier
reshape2 Package Is Useful
119
Long to wide: if data isn’t unique, you have to use fun.aggregate argument to tell it what to do�
reshape2 Package Is Useful
120
Long to wide: can name output column by putting in quotes in formula�
reshape2 Package Is Useful
121
Wide to long: get the data�
reshape2 Package Is Useful
122
Wide to long: melt it
reshape2 Package Is Useful
123
Wide to long: name outcomes
reshape2 Package Is Useful
124
Wide to long: more independent variables
to