Hypothesis Testing�Effect Size Analysis
Lecture #2: Statistical Analysis
http://chakkrit.com/teaching/quantitative-research-methods
Statistical Analysis
LAST UPDATED
27/06/2022
Examples of questions that require statistical methods
LAST UPDATED
27/06/2022
Examples of questions that require statistical methods (in SE)
LAST UPDATED
27/06/2022
Examples of questions that require statistical methods (in ML)
LAST UPDATED
27/06/2022
Q1) Is the dataset follows a normal distribution?
Choosing the Right Statistical Test + Effect Size | Cheat Sheet
| Interval/Ratio�(Normality assumed)�Called “Parametric tests” | Interval/Ratio�(Normality not assumed), Ordinal�Called “non-parametric tests) | Binomial |
Compare 2 paired groups | Paired t test | Wilcoxon test | McNemar’s test |
Compare 2 unpaired groups | Unpaired t test | Mann-Whitney test | Fisher’s test |
Compare >2 matched groups | Repeated-measures ANOVA | Friedman test | Cochran’s Q test |
Compare >2 unmatched groups | ANOVA | Kruskal-Wallis test | Chi-square test |
Find relationship between 2 variables | Pearson correlation | Spearman correlation | Cramer’s V |
LAST UPDATED
27/06/2022
An Example: Do smart children tend to come from rich family?
Hypothesis Testing | Using (inferential) statistical analysis
LAST UPDATED
27/06/2022
Hypothesis Testing | More Examples
LAST UPDATED
27/06/2022
Hypothesis Testing | An example of house price dataset
https://www.kaggle.com/competitions/house-prices-advanced-regression-techniques/data
H0 (Null hypothesis): Size of living area have no relationship with the sale prices�
H1 (Alternative hypothesis): Size of living area have a positive relationship with the sale prices
A larger house (GrLivArea)�should be more expensive (SalePrice)
Which statistical test should be used for this scenario?
df = df[order(df$GrLivArea),]
smaller_houses = df %>% slice(0:as.integer(nrow(df)/2))�larger_houses = df %>% slice(as.integer(nrow(df)/2):nrow(df))
Let’s define the size of house based on the living area�(lower half = smaller_house, upper half = larger_house
LAST UPDATED
27/06/2022
Outlier Detection | Having outliers in the dataset may interfere subsequent analyses
An outlier is a data point that lies an abnormal distance from others in a dataset,�i.e., values lower than Q1 - 1.5IQR or higher than Q3 +1.5IQR.
Lets visualize outliers using the house prices dataset: https://www.kaggle.com/competitions/house-prices-advanced-regression-techniques/data
It is a common practice to exclude outliers from a study.�You can calculate Q1, Q3, and IQR to find an outlier and remove them�or use this method: df = df[!df$SalePrice %in% boxplot.stats(df$SalePrice)$out,]
library(ggplot2)�ggplot(df) +
aes(x = "", y = SalePrice) +
geom_boxplot(fill = "green") +
theme_minimal() +
rotate()
Visualizing outliers using box plot
LAST UPDATED
27/06/2022
Data Preparation
df <- read.csv("train.csv", header=TRUE)
# remove outliers
df = df[!df$SalePrice %in% boxplot.stats(df$SalePrice)$out,]
df = df[!df$GrLivArea %in% boxplot.stats(df$GrLivArea)$out,] �df$has_fireplace = df$FireplaceQu > 0 # define whether a house has fireplace
library(dplyr)
# define smaller house as the first half,
# and larger house is second half based on the living area
df = df[order(df$GrLivArea),]
smaller_houses = df %>% slice(0:as.integer(nrow(df)/2))
larger_houses = df %>% slice(as.integer(nrow(df)/2):nrow(df))
Source: https://www.kaggle.com/competitions/house-prices-advanced-regression-techniques/data
LAST UPDATED
27/06/2022
1) Parametric vs Non-Parametric Tests? Is the data follows a normal distribution (or a Gaussian distribution)?
2) Comparing paired or unpaired samples?
3) Comparing interval/ratio, ordinal, or binomial data?
Choosing a Right Statistical Test | There are three aspects to consider:
paired
unpaired
interval/ratio
ordinal
binomial
LAST UPDATED
27/06/2022
1.1) Visual judgment based on density plot and quantile-quantile plot
1.2) Using Shapiro-Wilk’s normality test (H0=normal distribution)
Q1) Parametric vs Non-Parametric Tests?
library(ggpubr)
ggdensity(df$SalePrice)
ggdensity(df$GrLivArea)
ggqqplot(df$SalePrice)�ggqqplot(df$GrLivArea)
Living area
If the distribution is normal, �the dots should form a straight line.
SalePrice
SalePrice
Living area
If the distribution is normal, �distribution should be bell-shaped
shapiro.test(df$SalePrice)
shapiro.test(df$GrLivArea)
Even though the density plots above indicate a bell-shaped distribution, the Shapiro-Wilk tests yield P-value < 0.05. �
This P-value helps us determine the significance of the test results in relation to the hypothesis. A p-value less than 0.05 indicates strong evidence against the null hypothesis (H0).
�Therefore, we reject H0 (data is normally distributed) and accept H1 (data is not normally distributed)
LAST UPDATED
27/06/2022
Q1) Parametric vs Non-Parametric Tests?
Parametric test
Non-parametric test
Suggestion!
LAST UPDATED
27/06/2022
Q2) Comparing paired or unpaired samples?
Paired samples (dependent) are the sample in which natural or matched couplings occur. �The data point in one sample is uniquely paired to a data point in the second sample.
Unpaired samples (independent) are the sample of unrelated groups.
paired
unpaired
Student | Test1 | Test2 |
ID1 | 100 | 100 |
ID2 | 80 | 90 |
ID3 | 60 | 80 |
Price of small houses | Price of large houses |
500,000 | 1,200,000 |
600,000 | 1,500,000 |
700,000 | |
Example Research Questions:
Example Research Questions:
LAST UPDATED
27/06/2022
Q3) Comparing interval/ratio, ordinal, or binomial data?
Interval/ratio: numbers in interval, difference is meaningful,�e.g., Interval -> temperature in Celsius or Fahrenheit
Ratio -> temperature in Kelvin (has a clear definition of zero, � 0 Kelvin = lowest temp possible)
Ordinal: ordinal scale where the order matters� E.g., likert scale: Extremely dislike, dislike, neutral, like, extremely like
Binomial: having only two possible values� E.g., Diagnosed as having COVID or not
LAST UPDATED
27/06/2022
Q1) Is the dataset follows a normal distribution?
Choosing the Right Statistical Test + Effect Size | Cheat Sheet
| Interval/Ratio�(Normality assumed)�Called “Parametric tests” | Interval/Ratio�(Normality not assumed), Ordinal�Called “non-parametric tests) | Binomial |
Compare 2 paired groups | Paired t test | Wilcoxon test | McNemar’s test |
Compare 2 unpaired groups | Unpaired t test | Mann-Whitney test | Fisher’s test |
Compare >2 matched groups | Repeated-measures ANOVA | Friedman test | Cochran’s Q test |
Compare >2 unmatched groups | ANOVA | Kruskal-Wallis test | Chi-square test |
Find relationship between 2 variables | Pearson correlation | Spearman correlation | Cramer’s V |
LAST UPDATED
27/06/2022
Q1) Is the dataset follows a normal distribution?
Choosing the Right Statistical Test + Effect Size | Cheat Sheet
| Interval/Ratio�(Normality assumed)�Called “Parametric tests” | Interval/Ratio�(Normality not assumed), Ordinal�Called “non-parametric tests) | Binomial |
Compare 2 paired groups | Paired t test | Wilcoxon test | McNemar’s test |
Compare 2 unpaired groups | Unpaired t test | Mann-Whitney test | Fisher’s test |
Compare >2 matched groups | Repeated-measures ANOVA | Friedman test | Cochran’s Q test |
Compare >2 unmatched groups | ANOVA | Kruskal-Wallis test | Chi-square test |
Find relationship between 2 variables | Pearson correlation | Spearman correlation | Cramer’s V |
LAST UPDATED
27/06/2022
Paired T-Test
Case study
A group of students took pre and post lecture exams. Do the students achieved a higher score in post-exam than the pre-exam or not?�
H0 (Null hypothesis)
The pre- and post-lecture exam scores are not statistically different.
�H1 (alternative hypothesis)
The post-lecture exam scores are significantly statistically higher than the pre-lecture exam scores.
Comparing the means of two paired groups (parametric test)
Requirements
- dependent variable is interval or ratio
- samples are drawn from a normally distributed population
- the comparing data must have the same size (i.e., paired)
Interpretation
- H0 (accept if p>=0.05): There is no significant difference in the means between the two groups
- H1 (accept if p < 0.05): There is a significant difference in the means between the two groups
Student ID | Pre-lecture exam scores | Post-lecture exam scores |
1 | 4 | 7 |
2 | 3 | 5 |
3 | 8 | 9 |
4 | 2 | 7 |
5 | 3 | 8 |
LAST UPDATED
27/06/2022
Paired T-Test
Comparing the means of two paired groups (parametric test)
R Code
preExam = c(4,3,8,2,3)
postExam = c(7,5,9,7,8)
t.test(x=postExam, y=preExam, alternative = "greater",
var.equal = FALSE, paired = TRUE)
Results
Paired t-test
data: postExam and preExam
t = 4, df = 4, p-value = 0.008065
alternative hypothesis: true difference in means is greater than 0
95 percent confidence interval:
1.494523 Inf
sample estimates:
mean of the differences
3.2
Effect size
library(effectsize)
effect_size = cohens_d(x=postExam, y=preExam, var.equal = FALSE)
interpret_cohens_d(effect_size)
Cohen's d | 95% CI | Interpretation
-----------------------------------------
1.63 | [0.09, 3.10] | large
Interpretation
Rejecting H0, accepting H1.�The post-lecture test scores are statistically significantly higher than the pre-lecture exam scores.�(with a large effect size)
** “greater” = test whether x is greater than y
Conclusion: Students learn well during the lectures.
Caveat Is this conclusion statistically sound?�Answer: No, because we did not test whether the population where the data was drawn from has normal distribution or not.
LAST UPDATED
27/06/2022
Wilcoxon (Signed-Rank) Test
Case study
A group of students took pre and post lecture exams. Do the students achieved a higher score in post-exam than the pre-exam or not?�
H0 (Null hypothesis)
The pre- and post-lecture exam scores are not statistically different.
�H1 (alternative hypothesis)
The post-lecture exam scores are significantly statistically higher than the pre-lecture exam scores.
Comparing the means of two paired groups (non-parametric test)
Requirements
- dependent variable is ordinal, interval/ratio
- if interval or ratio, the population must not be normally distributed
- the data must have the same size (i.e., paired)
Interpretation
- H0 (accept if p>=0.05): There is no significant difference in the means between the two groups
- H1 (accept if p < 0.05): There is a significant difference in the means between the two groups
Student ID | Pre-lecture exam scores | Post-lecture exam scores |
1 | 4 | 7 |
2 | 3 | 5 |
3 | 8 | 9 |
4 | 2 | 7 |
5 | 3 | 8 |
LAST UPDATED
27/06/2022
Wilcoxon (Signed-Rank) Test
Comparing the means of two paired groups (non-parametric test)
R Code
preExam = c(4,3,8,2,3)
postExam = c(7,5,9,7,8)
library(coin)
wilcox.test(postExam, preExam, paired=T, alternative = "greater")
Results
Wilcoxon signed rank test with continuity correction
data: postTest and preTest
V = 15, p-value = 0.02895
alternative hypothesis: true location shift is greater than 0
Effect size
scores = c(4,3,8,2,3,7,5,9,7,8) # re-organize the data to calculate effect size
type = c("pre","pre","pre","pre","pre","post","post","post","post","post")
df = data.frame(scores=scores, type=type)
library(rcompanion)
cliffDelta(scores~type, data = df)
� Cliff.delta 0.72
Interpretation
Rejecting H0, accepting H1.�The post-lecture exam scores are significantly statistically higher than the pre-lecture exam scores (with a large effect size).
Rule of thumb | Small | Medium | Large |
Effect size | 0.10 | 0.30 | 0.50 |
Conclusion: Students learn well during the lectures.
LAST UPDATED
27/06/2022
Mcnemar’s test
Case study
A group of students took pre- and post-lecture exam. Do the students that passed the pre-exam will also pass the post-exams or not?
H0 (Null hypothesis)
The number of students that passed the pre- and post-lecture exams are not statistically different.
�H1 (alternative hypothesis)
The number of students that passed the pre- and post-lecture exams are statistically different.
Requirements
- dependent variable is binomial
- the data must have the same size (i.e., paired)
- the data is in “before & after” table format (see the right tables carefully)
A test that measures an association between two (paired) categorical variables.
| Pre-lecture exam | Post-lecture exam |
Passed | 30 | 90 |
Not passed | 70 | 10 |
| Pre-lecture Passed | Pre-lecture Not passed |
Post-lecture Passed | 5 | 5 |
Post-lecture Not passed | 25 | 65 |
For this test,�we have to re-format the table
Interpretation
- H0 (accept if p>=0.05): The occurrences of the outcomes for the two groups are equal.
- H1 (accept if p < 0.05): The occurrences of the outcomes for the two groups are not equal.
LAST UPDATED
27/06/2022
Mcnemar’s test
A test that measures an association between two (paired) categorical variables.
R Code
data <- matrix(c(5,5,25,65), ncol=2, byrow=T)
mcnemar.test(data)
Results
McNemar's Chi-squared test with continuity correction
data: data
McNemar's chi-squared = 12.033, df = 1, p-value = 0.0005226
Interpretation
Rejecting H0, accepting H1.�The number of students that passed the pre- and post-lecture exams are statistically different.
Implication: The number of students that passed the exam is significantly changed after the lecture (in this case, decreased).
Conclusion: Students did not learn well in the lecture.
| Pre-lecture exam | Post-lecture exam |
Passed | 30 | 90 |
Not passed | 70 | 10 |
| Pre-lecture Passed | Pre-lecture Not passed |
Post-lecture Passed | 5 | 5 |
Post-lecture Not passed | 25 | 65 |
For this test,�we have to re-format the table
LAST UPDATED
27/06/2022
Q1) Is the dataset follows a normal distribution?
Choosing the Right Statistical Test + Effect Size | Cheat Sheet
| Interval/Ratio�(Normality assumed)�Called “Parametric tests” | Interval/Ratio�(Normality not assumed), Ordinal�Called “non-parametric tests) | Binomial |
Compare 2 paired groups | Paired t test | Wilcoxon test | McNemar’s test |
Compare 2 unpaired groups | Unpaired t test | Mann-Whitney test | Fisher’s test |
Compare >2 matched groups | Repeated-measures ANOVA | Friedman test | Cochran’s Q test |
Compare >2 unmatched groups | ANOVA | Kruskal-Wallis test | Chi-square test |
Find relationship between 2 variables | Pearson correlation | Spearman correlation | Cramer’s V |
LAST UPDATED
27/06/2022
Q1) Is the dataset follows a normal distribution?
Choosing the Right Statistical Test + Effect Size | Cheat Sheet
| Interval/Ratio�(Normality assumed)�Called “Parametric tests” | Interval/Ratio�(Normality not assumed), Ordinal�Called “non-parametric tests) | Binomial |
Compare 2 paired groups | Paired t test | Wilcoxon test | McNemar’s test |
Compare 2 unpaired groups | Unpaired t test | Mann-Whitney test | Fisher’s test |
Compare >2 matched groups | Repeated-measures ANOVA | Friedman test | Cochran’s Q test |
Compare >2 unmatched groups | ANOVA | Kruskal-Wallis test | Chi-square test |
Find relationship between 2 variables | Pearson correlation | Spearman correlation | Cramer’s V |
LAST UPDATED
27/06/2022
Unpaired T-Test
Case study
A group of students are randomly sampled to take a final exam. The students can choose to take the exam in the morning or the afternoon (i.e., the number of students in the exam can be different). Do the scores achieved in the morning and the afternoon exams are different or not?�
H0 (Null hypothesis)
The scores achieved in the morning exam and the afternoon exams are not statistically different.�H1 (alternative hypothesis)
The scores achieved in the morning exam and the afternoon exams are statistically different.
Comparing the means of two unpaired groups (parametric test)
Requirements
- dependent variable is interval or ratio
- samples are drawn from a normally distributed population
- the data is unpaired (i.e., all columns in the dataset may not have the same size)
Interpretation
- H0 (accept if p>=0.05): There is no significant difference in the means between the two groups
- H1 (accept if p < 0.05): There is a significant difference in the means between the two groups
Students’ exam score | |
Morning exam | Afternoon exam |
8 | 7 |
4 | 5 |
2 | 3 |
9 | - |
5 | - |
LAST UPDATED
27/06/2022
Unpaired T-Test
Comparing the means of two unpaired groups (parametric test)
R Code
morning = c(8,4,2,9,5)
afternoon = c(7,5,3)
t.test(x=morning, y=afternoon, alternative = "two.sided",
var.equal = FALSE, paired = FALSE)
Results
Welch Two Sample t-test
data: morning and afternoon
t = 0.3468, df = 5.6789, p-value = 0.7412
alternative hypothesis: true difference in means is not equal to 0
95 percent confidence interval:
-3.692178 4.892178
sample estimates:
mean of x mean of y
5.6 5.0
Interpretation
Accepting H0�The scores achieved in the morning exam and the afternoon exams are not statistically different.
Effect size
effectsize::cohens_d(x=morning, y=afternoon, var.equal = FALSE).
“two.sided” = test whether the two samples are different
Students’ exam score | |
Morning exam | Afternoon exam |
8 | 7 |
4 | 5 |
2 | 3 |
9 | - |
5 | - |
Conclusion: Time of the day is not affecting the ability of the students to take the exam.
LAST UPDATED
27/06/2022
Mann-Whitney Test
Case study
A group of students are randomly sampled to take a final exam. The students can choose to take the exam in the morning or the afternoon (i.e., the number of students in the exam can be different). Do the scores achieved in the morning exam are higher than the scores achieved in the afternoon exams or not?�
H0 (Null hypothesis)
The scores achieved in the morning exam and the afternoon exams are not statistically different ��H1 (alternative hypothesis)
The scores achieved in the morning exam are higher than the scores achieved in the afternoon exams
Comparing the means of two unpaired groups (non-parametric test)
Requirements
- dependent variable is ordinal, interval/ratio
- if interval or ratio, the population must not be normally distributed
- the data is unpaired (i.e., all columns in the dataset may not have the same size)
or Mann-Whitney’s U test, �or Wilcoxon Rank sum test, �or non-parametric t test
Students’ exam score | |
Morning exam | Afternoon exam |
8 | 7 |
4 | 5 |
2 | 3 |
9 | - |
5 | - |
Interpretation
- H0 (accept if p>=0.05): There is no significant difference in the means between the two groups
- H1 (accept if p < 0.05): There is a significant difference in the means between the two groups
LAST UPDATED
27/06/2022
Mann-Whitney Test
Comparing the means of two unpaired groups (non-parametric test)
R Code
morning = c(8,4,2,3,5)
afternoon = c(9,9,9)
examTime = factor(c(rep("morning", length(morning)), rep("afternoon", length(afternoon))))
scores = c(morning, afternoon)
wilcox.test(scores ~ examTime, distribution="exact", alternative="greater")
Results
Wilcoxon rank sum test with continuity correction
data: v by g
W = 15, p-value = 0.01624
alternative hypothesis: true location shift is greater than 0
Interpretation
Rejecting H0, accepting H1.�The scores achieved in the morning exam are higher than the scores achieved in the afternoon exams
Effect size
library(rcompanion)
cliffDelta(scores~examTime, data = df)�� Cliff.delta 1
Rule of thumb | Small | Medium | Large |
Effect size | 0.10 | 0.30 | 0.50 |
Students’ exam score | |
Morning exam | Afternoon exam |
8 | 7 |
4 | 5 |
2 | 3 |
9 | - |
5 | - |
Conclusion: Taking the exam in the morning lead to a better test scores.
LAST UPDATED
27/06/2022
Fisher’s Test
Case study
A group of students are randomly sampled to take a final exam. The students can choose to take the exam in the morning or the afternoon (i.e., the number of students in the exam can be different). Do the students that took the morning exam are less likely to passed the exam?
H0 (Null hypothesis)
The odds that the students passed the morning exam and the afternoon exam are not statistically different.
�H1 (alternative hypothesis)
The odd that the students passed the morning exam is statistically less than that of the morning exam.
Requirements
- dependent variable is binomial (in form of a contingency table)
- the data is unpaired (i.e., all columns in the dataset may not have the same size)
- the data is in contingency table format
A test that measures an association between two (unparied) categorical variables that define a contingency table.
| Morning exam | Afternoon exam |
Passed | 10 | 100 |
Not passed | 30 | 30 |
Interpretation
- H0 (accept if p>=0.05): The occurrences of the outcomes for the two groups are equal.
- H1 (accept if p < 0.05): The occurrences of the outcomes for the two groups are not equal.
LAST UPDATED
27/06/2022
Fisher’s Test
A test that measures an association between two (unparied) categorical variables that define a contingency table.
R Code
m <- matrix(c(10,100,30,30), ncol=2, byrow=T)
fisher.test(m, alternative = "less")
Results
EFisher's Exact Test for Count Data
data: m
p-value = 4.452e-09
alternative hypothesis: true odds ratio is less than 1
95 percent confidence interval:
0.000000 0.214895
sample estimates:
odds ratio
0.101681
Interpretation
Rejecting H0, accepting H1.�The odd that the students passed the morning exam is statistically less than that of the morning exam.
(with the odds ratio of 0.1)
| Morning exam | Afternoon exam |
Passed | 10 | 100 |
Not passed | 30 | 30 |
Conclusion: Students that took the exam in the afternoon tend to perform better.
LAST UPDATED
27/06/2022
Q1) Is the dataset follows a normal distribution?
Choosing the Right Statistical Test + Effect Size | Cheat Sheet
| Interval/Ratio�(Normality assumed)�Called “Parametric tests” | Interval/Ratio�(Normality not assumed), Ordinal�Called “non-parametric tests) | Binomial |
Compare 2 paired groups | Paired t test | Wilcoxon test | McNemar’s test |
Compare 2 unpaired groups | Unpaired t test | Mann-Whitney test | Fisher’s test |
Compare >2 matched groups | Repeated-measures ANOVA | Friedman test | Cochran’s Q test |
Compare >2 unmatched groups | ANOVA | Kruskal-Wallis test | Chi-square test |
Find relationship between 2 variables | Pearson correlation | Spearman correlation | Cramer’s V |
LAST UPDATED
27/06/2022
Q1) Is the dataset follows a normal distribution?
Choosing the Right Statistical Test + Effect Size | Cheat Sheet
| Interval/Ratio�(Normality assumed)�Called “Parametric tests” | Interval/Ratio�(Normality not assumed), Ordinal�Called “non-parametric tests) | Binomial |
Compare 2 paired groups | Paired t test | Wilcoxon test | McNemar’s test |
Compare 2 unpaired groups | Unpaired t test | Mann-Whitney test | Fisher’s test |
Compare >2 matched groups | Repeated-measures ANOVA | Friedman test | Cochran’s Q test |
Compare >2 unmatched groups | ANOVA | Kruskal-Wallis test | Chi-square test |
Find relationship between 2 variables | Pearson correlation | Spearman correlation | Cramer’s V |
LAST UPDATED
27/06/2022
Analysis of Variance Test
Case study
Three groups of students took a final exam. Do the exam scores achieved by the students in three groups are different or not?
H0 (Null hypothesis)
The exam scores achieved by the student in three groups are not statistically different.
�H1 (alternative hypothesis)
The exam scores achieved by the students in three groups are statistically different.
Comparing the means of more than two groups
Requirements
- dependent variable is interval or ratio
- samples are drawn from a normally distributed population
- the data can be both paired or unpaired
Interpretation
- H0 (accept if p>=0.05): There is no significant difference in the means among all groups
- H1 (accept if p < 0.05): There is a significant difference in the means among all groups
Students’ exam score | ||
GroupA | GroupB | GroupC |
5 | 5 | 5 |
6 | 6 | 6 |
7 | 7 | 7 |
8 | 8 | - |
9 | - | - |
LAST UPDATED
27/06/2022
Analysis of Variance Test
Comparing the means of more than two groups
R Code
score = c(5,6,7,8,9,5,6,7,8,5,6,7)
group = c("A","A","A","A","A","B","B","B","B","C","C","C")
df <- data.frame(score, group)
# test for homogeneity of variances
bartlett.test(score ~ group, df)
# p-value = 0.5986 (>0.5), variances are equal
# anova test
aov <- aov(score ~ group, df)
summary(aov)
Results
Df Sum Sq Mean Sq F value Pr(>F)
group 2 1.917 0.9583 0.507 0.618
Residuals 9 17.000 1.8889
Interpretation
Accepting H0�The exam scores achieved by the students in three groups are not statistically different.
Effect size
= sum sq of effect / sum sq total = 1.917 / (1.917 + 17) = 0.1
# Indicating that only ~10% of total variance is accounted for by treatment effect
Students’ exam score | ||
GroupA | GroupB | GroupC |
5 | 5 | 5 |
6 | 6 | 6 |
7 | 7 | 7 |
8 | 8 | - |
9 | - | - |
Students’ exam score | |
Group | Score |
A | 5 |
A | 6 |
A | 7 |
A | 8 |
A | 9 |
B | 5 |
B | 6 |
B | 7 |
B | 8 |
C | 5 |
C | 6 |
C | 7 |
This library require different form of data.
Conclusion: The students from three groups performed similarly in the exam.
LAST UPDATED
27/06/2022
Friedman Test
Case study
A group of students takes the exams in three subjects. Do the exam scores achieved by the students in three groups are different or not?
H0 (Null hypothesis)
The exam scores achieved by the student in three groups are not statistically different.
�H1 (alternative hypothesis)
The exam scores achieved by the students in three groups are statistically different.
Comparing the means of more than two matched groups (non-parametric test)
Requirements
- dependent variable is interval/ratio, ordinal
- if interval or ratio, the population can be not normally distributed
- the data must have the same size (i.e., matched)
Student ID | Subject A | Subject B | Subject C |
1 | 4 | 7 | 9 |
2 | 3 | 5 | 8 |
3 | 8 | 9 | 9 |
4 | 2 | 7 | 8 |
5 | 3 | 8 | 9 |
Interpretation
- H0 (accept if p>=0.05): There is no significant difference in the means among all groups
- H1 (accept if p < 0.05): There is a significant difference in the means among all groups
LAST UPDATED
27/06/2022
Friedman Test
Comparing the means of more than two matched groups (non-parametric test)
R Code
data <- cbind(c(4,3,8,2,3), c(7,5,9,7,8), c(9,8,9,8,9))
friedman.test(data)
Results
Friedman rank sum test
data: data2
Friedman chi-squared = 9.5789, df = 2, p-value = 0.008317
Effect size
Unfortunately, there is no direct way to calculate the effect size for Friedman test.
You need to perform Mann-Whitney test to calculate , where Z is outcome from the Mann-Whitney test and N is the total number of samples. See https://yatani.jp/teaching/doku.php?id=hcistats:kruskalwallis
Interpretation
Rejecting H0, accepting H1.�The exam scores achieved by the students in three groups are statistically different.
Student ID | Subject A | Subject B | Subject C |
1 | 4 | 7 | 9 |
2 | 3 | 5 | 8 |
3 | 8 | 9 | 9 |
4 | 2 | 7 | 8 |
5 | 3 | 8 | 9 |
Conclusion: The students from three groups performed differently in the exam.
LAST UPDATED
27/06/2022
Cochran’s Q Test
Case study
A group of students took the exams in three subjects. Do the odds that the students passed the exams are similar for all three subjects.
H0 (Null hypothesis)
The odds that the students passed the exams in three subjects are not statistically different.
�H1 (alternative hypothesis)
The odds that the students passed the exams in three subjects are statistically different.
Requirements
- dependent variable is binomial
- the data must have the same size (i.e., matched)
A test that measures an association between two or more (matched) categorical variables.
Student passed the exam (0=no, 1=yes) | |||
Student ID | Subject A | Subject B | Subject C |
1 | 1 | 0 | 1 |
2 | 0 | 0 | 1 |
3 | 0 | 1 | 0 |
4 | 0 | 1 | 1 |
5 | 0 | 0 | 1 |
6 | 1 | 1 | 1 |
7 | 0 | 0 | 1 |
8 | 0 | 1 | 1 |
9 | 0 | 1 | 1 |
10 | 0 | 1 | 1 |
Interpretation
- H0 (accept if p>=0.05): The occurrences of the outcomes for all groups are equal.
- H1 (accept if p < 0.05): The occurrences of the outcomes for all groups are not equal.
LAST UPDATED
27/06/2022
Cochran’s Q Test
A test that measures an association between two or more (matched) categorical variables.
R Code
passed <- c(1,0,1,0,0,1,0,1,0,0,1,1,0,0,1,1,1,1,0,0,1,0,1,1,0,1,1,0,1,1)
student <- factor(c(1,1,1,2,2,2,3,3,3,4,4,4,5,5,5,6,6,6,7,7,7,8,8,8,9,9,9,10,10,10))
subject <- factor(rep(1:3, 10))
data <- data.frame(student, subject, passed)
library(coin)
symmetry_test(passed ~ factor(subject) | factor(student), data = data, teststat = "quad")
Results
Asymptotic General Symmetry Test
data: Answer by factor(Software) (1, 2, 3)
stratified by factor(Participant)
chi-squared = 8.2222, df = 2, p-value = 0.01639
Effect size
We can use McNemar’s test on each pair of the group to find odds ratio.
Interpretation
Rejecting H0, accepting H1.�The odds that the students passed the exams in three subjects are statistically different.
Student passed the exam (0=no, 1=yes) | |||
Student ID | Subject A | Subject B | Subject C |
1 | 1 | 0 | 1 |
2 | 0 | 0 | 1 |
3 | 0 | 1 | 0 |
4 | 0 | 1 | 1 |
5 | 0 | 0 | 1 |
6 | 1 | 1 | 1 |
7 | 0 | 0 | 1 |
8 | 0 | 1 | 1 |
9 | 0 | 1 | 1 |
10 | 0 | 1 | 1 |
Conclusion: The students preformed differently in three exams.
LAST UPDATED
27/06/2022
Q1) Is the dataset follows a normal distribution?
Choosing the Right Statistical Test + Effect Size | Cheat Sheet
| Interval/Ratio�(Normality assumed)�Called “Parametric tests” | Interval/Ratio�(Normality not assumed), Ordinal�Called “non-parametric tests) | Binomial |
Compare 2 paired groups | Paired t test | Wilcoxon test | McNemar’s test |
Compare 2 unpaired groups | Unpaired t test | Mann-Whitney test | Fisher’s test |
Compare >2 matched groups | Repeated-measures ANOVA | Friedman test | Cochran’s Q test |
Compare >2 unmatched groups | ANOVA | Kruskal-Wallis test | Chi-square test |
Find relationship between 2 variables | Pearson correlation | Spearman correlation | Cramer’s V |
LAST UPDATED
27/06/2022
Kruskal-Wallis Test
Case study
A group of students takes a final exam. The students can choose to take the exam in the morning, afternoon, or evening (i.e., the number of students in the exam can be different). Do the exam scores achieved in different time of the day are different or not?
H0 (Null hypothesis)
The exam scores achieved in different time of the day are not statistically different.
�H1 (alternative hypothesis)
The exam scores achieved in different time of the day are statistically different.
Comparing the means of more than two unmatched groups (non-parametric test)
Requirements
- dependent variable is interval/ratio, ordinal
- not required the population to be normally distributed
- the data is unpaired (i.e., all columns in the dataset may not have the same size)
Exam scores of the students | ||
Morning | Afternoon | Evening |
4 | 7 | 9 |
3 | 5 | 8 |
8 | 9 | 9 |
2 | 7 | 8 |
3 | 8 | |
Interpretation
- H0 (accept if p>=0.05): There is no significant difference in the means among all groups
- H1 (accept if p < 0.05): There is a significant difference in the means among all groups
LAST UPDATED
27/06/2022
Kruskal-Wallis Test
R Code
data <- cbind(c(4,3,8,2,3), c(7,5,9,7,8), c(9,8,9,8,9))
kruskal.test(data)
Results
Kruskal-Wallis rank sum test
data: data
Kruskal-Wallis chi-squared = 60.049, df = 2, p-value = 9.132e-14
Effect size
Unfortunately, there is no direct way to calculate the effect size for Kruskal-Wallis test.
You need to perform Mann-Whitney test to calculate , were Z is outcome from the Mann-Whitney test and N is the total number of samples. See https://yatani.jp/teaching/doku.php?id=hcistats:kruskalwallis
Interpretation
Rejecting H0, accepting H1.�- The exam scores achieved in different time of the day are statistically different.
Comparing the means of more than two unmatched groups (non-parametric test)
Conclusion: The time chosen to take the exam can affect the exam scores.
Exam scores of the students | ||
Morning | Afternoon | Evening |
4 | 7 | 9 |
3 | 5 | 8 |
8 | 9 | 9 |
2 | 7 | 8 |
3 | 8 | |
LAST UPDATED
27/06/2022
Chi-square Test
Case study
A group of students are randomly sampled to take a final exam. The students can choose to take the exam in the morning, afternoon, or evening (i.e., the number of students in the exam can be different). Do the odds that the students passed the exams in different time of the day are similar?�
H0 (Null hypothesis)
The odds that the students passed the exams in different time of the day are not statistically different.
�H1 (alternative hypothesis)
The odds that the students passed the exams in different time of the day are statistically different.
Requirements
- dependent variable is binomial
- data is in contingency table format
A test that measures an association between two or more (unmatched) categorical variables that define a contingency table.
| Morning | Afternoon | Evening |
Passed | 16 | 11 | 3 |
Not passed | 21 | 8 | 15 |
Interpretation
- H0 (accept if p>=0.05): The occurrences of the outcomes for all groups are equal.
- H1 (accept if p < 0.05): The occurrences of the outcomes for all groups are not equal.
LAST UPDATED
27/06/2022
Chi-square Test
R Code
data <- matrix(c(16, 11, 3, 21, 8, 15), ncol=3, byrow=T)
chisq.test(data)
Results
Pearson's Chi-squared test
data: data
X-squared = 6.742, df = 2, p-value = 0.03435
Effect size
library(vcd)
assocstats(data)
X^2 df P(> X^2)
Likelihood Ratio 7.2218 2 0.027027
Pearson 6.7420 2 0.034355
Phi-Coefficient : NA
Contingency Coeff.: 0.289
Cramer's V : 0.302
Interpretation
Rejecting H0, accepting H1.�The odds that the students passed the exams in different time of the day are statistically different.
A test that measures an association between two or more (unmatched) categorical variables that define a contingency table.
| Morning | Afternoon | Evening |
Passed | 16 | 11 | 3 |
Not passed | 21 | 8 | 15 |
Rule of thumb | small size | medium size | large size |
Cramer's phi or V | 0.10 | 0.30 | 0.50 |
Conclusion: Taking the exam in different time of the day could affect the exam outcome.
LAST UPDATED
27/06/2022
Pearson Correlation
Case study
A group of students took different length of time (in minutes) to prepare for the exam. Do the length of the exam preparation time and the test scores are correlated.
�H0 (Null hypothesis)
There is no correlation between the length of the exam preparation time and the test scores.
�H1 (alternative hypothesis)
There is a correlation between the length of exam preparation time and the test scores.
A test for correlation (i.e., the strength of the relationship) between two interval/ratio variables
Requirements
- dependent variable is interval/ratio
- assume normal distribution
Interpretation
- H0 (accept if r = 0): There is no correlation between the two variables.
- H1 (accept if r ≠ 0): There is a correlation between the two variables.
LAST UPDATED
27/06/2022
Pearson Correlation
A test for correlation (i.e., the strength of the relationship) between two interval/ratio variables
R Code
time <- c(10,14,12,20,15,13,18,11,10)
scores <- c(22,21,25,35,28,29,31,19,17)
cor.test(time,scores,method="pearson")
Results
Pearson's product-moment correlation
data: time and scores
t = 4.6855, df = 7, p-value = 0.002246
alternative hypothesis: true correlation is not equal to 0
95 percent confidence interval:
0.4900229 0.9724978
sample estimates:
cor
0.8707668
Interpretation
Rejecting H0, accepting H1.�There is a large positive correlation between the length of exam preparation time and the test scores.
Rule of thumb | small size | medium size | large size |
Pearson's r | 0.1 | 0.3 | 0.5 |
Conclusion: Spending more time in preparing can lead to a higher exam scores.
LAST UPDATED
27/06/2022
Spearman Correlation
Case study
A group of students took different length of time (in minutes) to prepare for the exam. Do the length of the exam preparation time and the test scores are correlated.
�H0 (Null hypothesis)
There is no correlation between the length of the exam preparation time and the test scores.
�H1 (alternative hypothesis)
There is a correlation between the length of exam preparation time and the test scores.
A test for correlation (i.e., the strength of the relationship) between two interval/ratio variables
Requirements
- dependent variable is interval/ratio
- not assume normal distribution
Interpretation
- H0 (accept if r = 0): There is no correlation between the two variables.
- H1 (accept if r ≠ 0): There is a correlation between the two variables.
LAST UPDATED
27/06/2022
Spearman Correlation
A test for correlation (i.e., the strength of the relationship) between two interval/ratio variables
R Code
time <- c(10,14,12,20,15,13,18,11,10)
scores <- c(22,21,25,35,28,29,31,19,17)
cor.test(time,scores,method="pearson")
Results
Spearman's rank correlation rho
data: time and scores
S = 22.593, p-value = 0.007889
alternative hypothesis: true rho is not equal to 0
sample estimates:
rho
0.8117226
Interpretation
Rejecting H0, accepting H1.�There is a large positive correlation between the length of exam preparation time and the test scores.
Rule of thumb | small size | medium size | large size |
Spearman’s rho | 0.1 | 0.3 | 0.5 |
Conclusion: Spending more time in preparing can lead to a higher exam scores.
LAST UPDATED
27/06/2022
Cramer’s V
Case study
Students are separated in group A and group B in an exam. In this exam, the students can choose to use either pen or pencil as their writing tool. Do the students in two groups choose the writing tools differently?��H0 (Null hypothesis)
There is no association between the student groups and the writing tools used.
�H1 (alternative hypothesis)
There is an association between the student groups and the writing tools used.
Requirements
- dependent variable is binomial
- the data is in a crosstab table format
Interpretation
- H0 (accept if r = 0): There is no association between the two variables.
- H1 (accept if r ≠ 0): There is an association between the two variables.
| Writing tool | |
| Pen | Pencil |
Students group A | 20 | 10 |
Students group B | 3 | 27 |
A test that measures a coefficients of an association (i.e., correlation for categorical data) between two binomial variables in a crosstab table. In other words, it represents how the distribution of the data are changing depending on one variable.
LAST UPDATED
27/06/2022
Cramer’s V
R Code
data <- matrix(c(20, 10, 3, 27), ncol=2, byrow=T)
library(vcd)
assocstats(data)
Results
X^2 df P(> X^2)
Likelihood Ratio 22.185 1 2.4762e-06
Pearson 20.376 1 6.3622e-06
Phi-Coefficient : 0.583
Contingency Coeff.: 0.503
Cramer's V : 0.583
Interpretation
Rejecting H0, accepting H1.�There is a large association between the student groups and the writing tools used.
A test that measures a coefficients of an association (i.e., correlation for categorical data) between two binomial variables in a crosstab table. In other words, it represents how the distribution of the data are changing depending on one variable.
Rule of thumb | small size | medium size | large size |
Cramer's phi or V | 0.10 | 0.30 | 0.50 |
Conclusion: The students in group A and group B chose the writing tools differently.
| Writing tool | |
| Pen | Pencil |
Students group A | 20 | 10 |
Students group B | 3 | 27 |
LAST UPDATED
27/06/2022
Quizzes
LAST UPDATED
27/06/2022
Exercise: Which statistical methods would you use?
LAST UPDATED
27/06/2022