1 of 33

Hypothesis testing - 2

Saket Choudhary

saketc@iitb.ac.in

Introduction to Public Health Informatics

DH 302

Lecture 06 || Friday, 24^th January 2025

2 of 33

From last lecture…

In a land of stats, so wild and vast,�Hypothesis testing had students aghast.�"Is it null? Is it not? Should we reject?"�Confusion spread, hard to correct.

Coauthored with ChatGPT

Then came the voice, "Here's the key,�State your null as plain as can be.�Assume it's true, don't let it stray,�And let your data have its say."

But oh, the p-values, they played their tricks,�"Below 0.05? It's a statistical fix!"�"Above that line? We must comply—�The null survives, we let it fly."

3 of 33

From last lecture…

https://xkcd.com/892/

4 of 33

From last lecture…

Review: Chi-squared test and G-test
Expectations, Variances, CLT, Normal approximation
Testing for difference of means
Dimensionality reduction primer

5 of 33

Why do we select the null as such?

Purpose of hypothesis testing is largely to impose self-skepticism (“You are innocent unless proven guilty”)
We usually take the occam’s razor approach, assume the simplest thing that could be true
"We cannot conclusively affirm a hypothesis, but we can conclusively negate it" – Karl Popper
It is easy to specify the null hypothesis, often we don’t know what the alternate hypothesis explicitly is. For example, there is mean difference between the two populations – but how wide? But easy to say – it is zero (difference is ‘null’).
Think about this argument: “All swans are white”. What is easier: ‘rejecting it’ or ‘accepting it’?

Also see

6 of 33

dfdf

Visualizing the p-values region

Area = α/2

Distribution of T under H₀

Significant

findings

Null findings

Significant

findings

T_1-α/2

T_α/2

P-value

T_obs

P-value = Probability of sampling a test statistic at least as extreme as the observed test statistic if the null hypothesis is true

We “reject” the null hypothesis (H₀) if the pvalue is below the threshold (𝝰)

7 of 33

dfdf

Type I,II errors and Power

Type I error:

Probability that the test incorrectly rejects the null hypothesis (H₀) when the null H₀ is true
Often denoted by 𝞪

Type II error:

Probability that the test incorrectly fails to reject the null hypothesis (H₀) when H₀ is false
Often denoted by β

Power:

Probability that the test correctly rejects the null hypothesis (H₀) when the alternative hypothesis (H₁) is true
Commonly denoted by 1- β where β is the probability of making a Type II error by incorrectly failing to reject the null hypothesis.
As β increases, the power of a test decreases.

8 of 33

dfdf

Type I,II errors and Power

False-positive

False-

negative

Distribution of T under H₀

False-positive

Distribution of T under H_A

Power

False-

negative

The false-positive rate is the probability of incorrectly rejecting H₀.

The false-negative rate is the probability of incorrectly accepting H₀.

Power = 1 – false-negative rate = probability of correctly rejecting H₀.

T_α/2

T_1-α/2

9 of 33

dfdf

Types of error

Paul Ellis, 2010

Source

10 of 33

P-value is NOT the probability of the alternate hypothesis being correct.
P-value is NOT the probability of observing the result by chance.
P-value = Probability of observing a result at least as extreme if the null hypothesis holds true.

What is p-value?

11 of 33

Goodness of fit - Chi-squared test

Problem: What distribution should I fit?

Use a pseudocount of +1 in frequencies

= 5.744762

Is 5.7 high/low/medium?

12 of 33

Example of Chi-square in R

chi_square_stat <- sum((observed - expected)^2 / expected)

dof <- length(observed) - 1

p_value <- pchisq(chi_square_stat, dof, lower.tail = FALSE)

alpha <- 0.05 # Significance level

if (p_value < alpha) {

cat("Reject the null hypothesis")

} else {

cat("Fail to reject the null hypothesis")

}

P-value = 0.33 (>0.05)

Thus, we fail to reject the null hypothesis that the there is statistically no significant difference between the frequencies observed in Mar 2019 - Mar 2023 follow the same distribution as the Feb 2015 - Feb 2019 ones”

13 of 33

Another goodness of fit test - Likelihood ratio test (or G-test)

O_i = an observed count for bin i

E_i = an expected count for bin i, asserted by the null hypothesis

G follows a chi-squared distribution with degrees of freedom = (length of observations - 1)

14 of 33

Example of G-test in R

G_stat <- 2 * sum(observed * log(observed / expected), na.rm = TRUE)

dof <- length(observed) - 1

p_value <- pchisq(G_stat, df = dof)

alpha <- 0.05 # Significance level

if (p_value < alpha) {

cat("Reject the null hypothesis")

} else {

cat("Fail to reject the null hypothesis")

}

P-value = 0.59 (>0.05)

Thus, we fail to reject the null hypothesis that the there is statistically no significant difference between the frequencies observed in Mar 2019 - Mar 2023 follow the same distribution as the Feb 2015 - Feb 2019 ones”

15 of 33

Was the rare event statistically different in 4 years?

What is the probability of observing something as extreme?

Null hypothesis?

16 of 33

Was the rare event statistically different in 4 years?

What is the probability of observing entries as small as the one in April 2020?

Assume a poisson model

ꟛ = (sum of observations)/length(of observations)

P(X ≤ 3524) = ppois(x = 3524, lambda) < 1e-16 → The rare event is statistically different

Is this event a “rare” event?

17 of 33

A simpler case: Are trauma related deaths in 2020 similarly distributed as 2019?

Sum

88463

72503

df_wide$diff <- df_wide$`2020`-df_wide$`2019`

df_wide$chisq <- df_wide$diff^2/(df_wide$`2019`)

chi_square_stat <- sum(df_wide$chisq)

dof <- 11

p_value <- pchisq(chi_square_stat, dof, lower.tail = FALSE)

alpha <- 0.05 # Significance level

if (p_value < alpha) {

cat("Reject the null hypothesis")

} else {

cat("Fail to reject the null hypothesis")

}

Ideally, we should check if

(** this was automatically true for the 2015-2019 vs 2019 - 2023 example as we binned the observations)

18 of 33

A simpler case: Are trauma related deaths in 2020 similarly distributed as 2019?

Sum

88463

72503

chisq <- chisq.test(x = df_wide$`2020`, p = df_wide$`2019`, rescale.p = T)

> chisq$statistic

X-squared

2738.136

> chisq$p.value

[1] 0

# Method 1

72503

O_i

1

Probability from 2019

Since the assumption of number of deaths in 2020 != number of deaths in 2019, we first calculate the relative probability of deaths in each month 2019 (p_i)
p_iis then rescaled with total 2020 deaths to give E_i
Use chisq.test() to test 2020 values against p_i or explicitly calculate chisquare

df_wide$p_i <- df_wide$`2019`/sum(df_wide$`2019`)

df_wide$E_i <- df_wide$p_i * sum(df_wide$`2020`)

chisq_square_stat <- sum((df_wide$`2020`-df_wide$E_i)^2/df_wide$E_i)

dof <- 11

p_value <- pchisq(chi_square_stat, dof, lower.tail = FALSE)

> chisq_square_stat

[1] 2738.136

> p_value

[1] 0

# Method 2

19 of 33

How is G-test related to chi-squared test?

20 of 33

How is G-test (Likelihood ratio test) related to Chi-squared?

21 of 33

Central Limit Theorem

22 of 33

Binomial to Normal?

23 of 33

Binomial to Normal

24 of 33

Expectations and Variances

25 of 33

Expectations and Variances

26 of 33

Exercise - Calculate the mean of the binomial random variable

27 of 33

Some digression

28 of 33

Obesity and BMI - The old paradim

Lancet

29 of 33

Obesity: requirement of the new definition

Lancet

30 of 33

Obesity the new definition

Lancet

31 of 33

dfdf

Testing for difference in mean (median) of two samples

Source

32 of 33

Next: Testing for difference of means

Question: Is there statistically significant difference in mean between men and women BMI?

What is the null hypothesis?

Null Hypothesis: The mean bmi is same for men and womean

Data source

33 of 33

33

Questions?