1 of 22

Comparisons of Many Group Means: ANOVA

Tests to determine whether a mean differs across three or more sub-populations

2 of 22

�Outline

  • What’s left in our inference journey?
  • New data for today
  • ANOVA – testing for a difference in means across multiple (more than two) groups
    • Some intuition for ANOVA
    • The mechanics of ANOVA
    • A completed example using home sales in Ames, IA
    • Limits to ANOVA Results, and Post-Hoc (Follow-Up) Tests
  • What’s still left in our inference journey?

3 of 22

�Where We Are and Where We Are Going…

Inference On…

Covered?

One Numerical Variable

✔️

One Binary Categorical Variable

✔️

Associations Between a Numerical Variable and a Binary Categorical Variable

✔️

Associations Between Two Binary Categorical Variables

✔️

One MultiClass Categorical Variable

We’ve Omitted

Associations Between Two MultiClass Categorical Variables

We’ve Omitted

Associations Between One Numerical Variable and One MultiClass Categorical Variable

Today

Associations Between Two Numerical Variables

4 of 22

�Some New Datasets for Today

  • I’ve got several data sets for us to work from – the data is available here
    • Use the download icon to download the xlsx file and open it in Excel
    • The contexts available are…
      • Energy drink marketing and sales (completed example in the slides)
      • Mobile phones and screentime
      • Monthly streaming hours by platform
      • Frisbee throwing distance by brand
      • Influencer likes by post type
      • Video game reaction times by console

5 of 22

�Energy Drink Marketing and Sales

  • Open the DataForANOVA.xlsx file that you just downloaded
  • Navigate to the EnergyDrinkMarketing tab
    • How many observations are in the data set?
    • How many marketing strategies (strategy) are being compared?
  • There are three (3) marketing strategies being compared: TV Commerials, Social Media Posts, and Radio Spots
  • Our previous methods could only be used to compare two groups, so we’ll need something new
    • The alternative would be to conduct three pairwise tests, but that would inflate the likelihood of a Type I error (which we want to avoid)

6 of 22

�Analysis of Variance (ANOVA)

  •  

 

7 of 22

�Intuition for ANOVA

  •  

8 of 22

Completed Example: Energy Drink Marketing Strategy Performance

Scenario: A major beverage company has launched a new line of energy drink -- Purple Heifer. The company is investigating how to best spend their limited marketing budget, so they run a small 10-week pilot that includes radio spots, television commercials, and sponsored social media posts (one in each of three separate but very similar cities). They collect the weekly sales (in dollars) of Purple Heifer and now want to determine whether the average sales weekly sales differs by marketing strategy. Conduct a test at the 10% level of significance to determine whether there is evidence to suggest a difference in revenue due to marketing strategy.

9 of 22

Completed Example: Energy Drink Marketing Strategy Performance

  •  

10 of 22

Reformatting the Data: One Column�Per Group

We need to be a bit sneaky about doing this. Follow the steps below:

  1. Copy the column containing the group assignments and the column containing the numerical variable we are interested in to a new sheet.
    • Copy each column and then use paste special to paste values into the new sheet
  2. We’ll need to sort the values by the group so that all observations of each group appear consecutively – click the square between the A heading and row 1 indicator to highlight all the cells in your new sheet, then use the the sort feature to sort the rows by the column containing the group labels

If working with raw data like our loans data from earlier this semester

11 of 22

Reformatting the Data: One Column�Per Group (Continued)

  •  

12 of 22

Reformatting the Data: One Column�Per Group (Continued)

We need to be a bit sneaky about doing this. Follow the continued steps below:

    • Use the Insert menu on the ribbon to insert a Pivot Table
      • The Table/Range should contain the new group_member column, the grouping column, and the column containing your numerical variable
      • Choose to insert the Pivot Table into the Existing Worksheet – any cell not on top of existing data will be fine
    • Drag the group_member column to the Rows field, the grouping column to the Columns field, and the column containing the numerical variable to the Values field

We’ve finally got all of our data formatted appropriately for ANOVA!

13 of 22

�Conducting the Analysis of Variance in Excel

This step is much easier than the process for reformatting the data!

    • Click the Data menu on the ribbon
    • Choose Data Analysis (this is the Data Analysis Toolpak) from the far right edge of the ribbon
    • Choose ANOVA: Single Factor from the list of options and hit OK
    • For the Input Range: use your pivot table, excluding the Grand Totals and the Row Labels – you should have just the group names and observed values of the numerical variable selected when you are done
    • Check off that you have labels in the first row and change Alpha if necessary
    • For the Output Range: just choose an open and convenient cell in the current sheet, like you did with the Pivot Table

14 of 22

Returning to the Energy Drink Marketing �Strategy Performance Example

  •  

15 of 22

�A Note on the Results of an ANOVA Test

In the previous example we were able to conclude a statistically significant difference in means across the groups.

Limitations of the Conclusion: Our conclusion stated that at least one marketing strategy is associated with a different average weekly revenue from sales of Purple Heifer, but we don’t know which strategy is different, whether they all differ, which one is best, or which one is worst.

Post-Hoc Tests: If the result of an ANOVA test indicates that at least one group mean differs from the others, generally a follow-up (post-hoc) test is conducted to determine which pairs of groups differ, and how. One example of a commonly used test is the Tukey Honestly Significantly Different test. These tests are beyond the scope of this course, but you should know that they exist and to find how to carry them out if you need one.

A Hasty Solution: While not robust, if an ANOVA test indicates a significant difference in at least one mean, you can use a plot (like a side-by-side boxplot) to investigate further.

16 of 22

�Example: Screen Time

  •  

17 of 22

�Example: Monthly Streaming

Scenario: Morgan & Mills Therapeutics is launching a marketing campaign for a new drug to compete with Ozempic and Wegovy. They identify that advertising on streaming services is more impactful and efficient than traditional cable for this particular product. The Morgan & Mills marketing team is interested in whether average weekly streaming hours differs by streaming service or if all services have the same average usage. Use the MonthlyStreaming data from the DataForANOVA.xlsx file, which includes data collected from 147 randomly selected subscribers, to conduct a test to determine if average monthly streaming hours varies by service.

18 of 22

�Example: Frisbee Distances

Scenario: Disc-ciples of the Basket, a disc golf club is choosing a brand of discs to sell at their pro-shop. The club wants to sell the best, furthest flying discs so they test out five of the most well-known brands of disc golf discs in the country. They collect randomly sampled “drives” with each brand on their first hole and record the distances traveled (in feet). Analyze the DiscDistances data from the DataForANOVA.xlsx file to determine whether average drive distance is the same across all the brands or if there is evidence to suggest a different average drive length.

19 of 22

�Example: Influencer Likes

Scenario: Hashtag Hank is a hustler known for his outlandish two-day guarantee. Hank claims that he can turn anyone into a successful social media influencer in just two days, of course there’s a steep price to pay for Hank’s services. Hank claims that memes are all you need and that there is a difference in the average number of likes that an influencer’s post receives depending on the type of content posted. Analyze the InfluencerLikes data from the DataForANOVA.xlsx file to determine whether there is evidence to support Hank’s claim that the average number of likes (in thousands) on a post depends on the type of content posted.

20 of 22

�Example: Console Reaction Times

Scenario: A group of psychologists is interested in whether playing video games improves reaction times. Further, they are interested in whether the average reaction time of an individual to a visual stimulus is dependent on the primary platform that they play games on. A group of 85 randomly selected gamers was asked what their primary gaming platform was and then they took part in an experiment where they were surprised by a visual stimulus and their reaction time was measured in milliseconds. Analyze the ConsoleReactionTimes data from the DataForANOVA.xlsx file to determine whether average reaction time differs by preferred gaming console.

21 of 22

Inference: Where We’ve Been and Where �We Are Headed

Inference On…

Covered?

One Numerical Variable

✔️

One Binary Categorical Variable

✔️

Associations Between a Numerical Variable and a Binary Categorical Variable

✔️

Associations Between Two Binary Categorical Variables

✔️

One MultiClass Categorical Variable

We’ve Omitted

Associations Between Two MultiClass Categorical Variables

We’ve Omitted

Associations Between One Numerical Variable and One MultiClass Categorical Variable

✔️

Associations Between Two Numerical Variables

22 of 22

�Next Time…

  • What we’ll be doing…
    • Inference for Associations Between two Numerical Variables (Linear Regression)
  • How to prepare…
    • Read sections 12.1 – 12.7 in our textbook
  • Homework: Complete HW 10 (ANOVA) on MyOpenMath