Comparisons of Many Group Means: ANOVA
Tests to determine whether a mean differs across three or more sub-populations
�Outline
�Where We Are and Where We Are Going…
Inference On… | Covered? |
One Numerical Variable | ✔️ |
One Binary Categorical Variable | ✔️ |
Associations Between a Numerical Variable and a Binary Categorical Variable | ✔️ |
Associations Between Two Binary Categorical Variables | ✔️ |
One MultiClass Categorical Variable | We’ve Omitted |
Associations Between Two MultiClass Categorical Variables | We’ve Omitted |
Associations Between One Numerical Variable and One MultiClass Categorical Variable | Today |
Associations Between Two Numerical Variables | |
�Some New Datasets for Today
�Energy Drink Marketing and Sales
�Analysis of Variance (ANOVA)
�Intuition for ANOVA
Completed Example: Energy Drink Marketing Strategy Performance
Scenario: A major beverage company has launched a new line of energy drink -- Purple Heifer. The company is investigating how to best spend their limited marketing budget, so they run a small 10-week pilot that includes radio spots, television commercials, and sponsored social media posts (one in each of three separate but very similar cities). They collect the weekly sales (in dollars) of Purple Heifer and now want to determine whether the average sales weekly sales differs by marketing strategy. Conduct a test at the 10% level of significance to determine whether there is evidence to suggest a difference in revenue due to marketing strategy.
Completed Example: Energy Drink Marketing Strategy Performance
Reformatting the Data: One Column�Per Group
We need to be a bit sneaky about doing this. Follow the steps below:
If working with raw data like our loans data from earlier this semester
Reformatting the Data: One Column�Per Group (Continued)
Reformatting the Data: One Column�Per Group (Continued)
We need to be a bit sneaky about doing this. Follow the continued steps below:
We’ve finally got all of our data formatted appropriately for ANOVA!
�Conducting the Analysis of Variance in Excel
This step is much easier than the process for reformatting the data!
Returning to the Energy Drink Marketing �Strategy Performance Example
�A Note on the Results of an ANOVA Test
In the previous example we were able to conclude a statistically significant difference in means across the groups.
Limitations of the Conclusion: Our conclusion stated that at least one marketing strategy is associated with a different average weekly revenue from sales of Purple Heifer, but we don’t know which strategy is different, whether they all differ, which one is best, or which one is worst.
Post-Hoc Tests: If the result of an ANOVA test indicates that at least one group mean differs from the others, generally a follow-up (post-hoc) test is conducted to determine which pairs of groups differ, and how. One example of a commonly used test is the Tukey Honestly Significantly Different test. These tests are beyond the scope of this course, but you should know that they exist and to find how to carry them out if you need one.
A Hasty Solution: While not robust, if an ANOVA test indicates a significant difference in at least one mean, you can use a plot (like a side-by-side boxplot) to investigate further.
�Example: Screen Time
�Example: Monthly Streaming
Scenario: Morgan & Mills Therapeutics is launching a marketing campaign for a new drug to compete with Ozempic and Wegovy. They identify that advertising on streaming services is more impactful and efficient than traditional cable for this particular product. The Morgan & Mills marketing team is interested in whether average weekly streaming hours differs by streaming service or if all services have the same average usage. Use the MonthlyStreaming data from the DataForANOVA.xlsx file, which includes data collected from 147 randomly selected subscribers, to conduct a test to determine if average monthly streaming hours varies by service.
�Example: Frisbee Distances
Scenario: Disc-ciples of the Basket, a disc golf club is choosing a brand of discs to sell at their pro-shop. The club wants to sell the best, furthest flying discs so they test out five of the most well-known brands of disc golf discs in the country. They collect randomly sampled “drives” with each brand on their first hole and record the distances traveled (in feet). Analyze the DiscDistances data from the DataForANOVA.xlsx file to determine whether average drive distance is the same across all the brands or if there is evidence to suggest a different average drive length.
�Example: Influencer Likes
Scenario: Hashtag Hank is a hustler known for his outlandish two-day guarantee. Hank claims that he can turn anyone into a successful social media influencer in just two days, of course there’s a steep price to pay for Hank’s services. Hank claims that memes are all you need and that there is a difference in the average number of likes that an influencer’s post receives depending on the type of content posted. Analyze the InfluencerLikes data from the DataForANOVA.xlsx file to determine whether there is evidence to support Hank’s claim that the average number of likes (in thousands) on a post depends on the type of content posted.
�Example: Console Reaction Times
Scenario: A group of psychologists is interested in whether playing video games improves reaction times. Further, they are interested in whether the average reaction time of an individual to a visual stimulus is dependent on the primary platform that they play games on. A group of 85 randomly selected gamers was asked what their primary gaming platform was and then they took part in an experiment where they were surprised by a visual stimulus and their reaction time was measured in milliseconds. Analyze the ConsoleReactionTimes data from the DataForANOVA.xlsx file to determine whether average reaction time differs by preferred gaming console.
Inference: Where We’ve Been and Where �We Are Headed
Inference On… | Covered? |
One Numerical Variable | ✔️ |
One Binary Categorical Variable | ✔️ |
Associations Between a Numerical Variable and a Binary Categorical Variable | ✔️ |
Associations Between Two Binary Categorical Variables | ✔️ |
One MultiClass Categorical Variable | We’ve Omitted |
Associations Between Two MultiClass Categorical Variables | We’ve Omitted |
Associations Between One Numerical Variable and One MultiClass Categorical Variable | ✔️ |
Associations Between Two Numerical Variables | |
�Next Time…