1 of 33

Hypothesis testing - 2

Saket Choudhary

saketc@iitb.ac.in

Introduction to Public Health Informatics

DH 302

Lecture 06 || Friday, 24th January 2025

2 of 33

From last lecture…

In a land of stats, so wild and vast,�Hypothesis testing had students aghast.�"Is it null? Is it not? Should we reject?"�Confusion spread, hard to correct.

Coauthored with ChatGPT

Then came the voice, "Here's the key,�State your null as plain as can be.�Assume it's true, don't let it stray,�And let your data have its say."

But oh, the p-values, they played their tricks,�"Below 0.05? It's a statistical fix!"�"Above that line? We must comply—�The null survives, we let it fly."

3 of 33

From last lecture…

4 of 33

From last lecture…

  • Review: Chi-squared test and G-test
  • Expectations, Variances, CLT, Normal approximation
  • Testing for difference of means
  • Dimensionality reduction primer

5 of 33

Why do we select the null as such?

  • Purpose of hypothesis testing is largely to impose self-skepticism (“You are innocent unless proven guilty”)
  • We usually take the occam’s razor approach, assume the simplest thing that could be true
  • "We cannot conclusively affirm a hypothesis, but we can conclusively negate it" – Karl Popper
  • It is easy to specify the null hypothesis, often we don’t know what the alternate hypothesis explicitly is. For example, there is mean difference between the two populations – but how wide? But easy to say – it is zero (difference is ‘null’).
  • Think about this argument: “All swans are white”. What is easier: ‘rejecting it’ or ‘accepting it’?

6 of 33

dfdf

Visualizing the p-values region

Area = α/2

Area = α/2

Distribution of T under H0

Significant

findings

Null findings

Significant

findings

T1-α/2

Tα/2

P-value

Tobs

P-value = Probability of sampling a test statistic at least as extreme as the observed test statistic if the null hypothesis is true

We “reject” the null hypothesis (H0) if the pvalue is below the threshold (𝝰)

7 of 33

dfdf

Type I,II errors and Power

  • Type I error:
    • Probability that the test incorrectly rejects the null hypothesis (H0) when the null H0 is true
    • Often denoted by 𝞪
  • Type II error:
    • Probability that the test incorrectly fails to reject the null hypothesis (H0) when H0 is false
    • Often denoted by β
  • Power:
    • Probability that the test correctly rejects the null hypothesis (H0) when the alternative hypothesis (H1) is true
    • Commonly denoted by 1- β where β is the probability of making a Type II error by incorrectly failing to reject the null hypothesis.
    • As β increases, the power of a test decreases.

8 of 33

dfdf

Type I,II errors and Power

False-positive

False-

negative

Distribution of T under H0

False-positive

Distribution of T under HA

Power

False-

negative

The false-positive rate is the probability of incorrectly rejecting H0.

The false-negative rate is the probability of incorrectly accepting H0.

Power = 1 – false-negative rate = probability of correctly rejecting H0.

Tα/2

T1-α/2

9 of 33

dfdf

Types of error

10 of 33

  • P-value is NOT the probability of the alternate hypothesis being correct.
  • P-value is NOT the probability of observing the result by chance.
  • P-value = Probability of observing a result at least as extreme if the null hypothesis holds true.

What is p-value?

11 of 33

Goodness of fit - Chi-squared test

Problem: What distribution should I fit?

Use a pseudocount of +1 in frequencies

= 5.744762

Is 5.7 high/low/medium?

12 of 33

Example of Chi-square in R

chi_square_stat <- sum((observed - expected)^2 / expected)

dof <- length(observed) - 1

p_value <- pchisq(chi_square_stat, dof, lower.tail = FALSE)

alpha <- 0.05 # Significance level

if (p_value < alpha) {

cat("Reject the null hypothesis")

} else {

cat("Fail to reject the null hypothesis")

}

P-value = 0.33 (>0.05)

Thus, we fail to reject the null hypothesis that the there is statistically no significant difference between the frequencies observed in Mar 2019 - Mar 2023 follow the same distribution as the Feb 2015 - Feb 2019 ones”

13 of 33

Another goodness of fit test - Likelihood ratio test (or G-test)

Oi = an observed count for bin i

Ei = an expected count for bin i, asserted by the null hypothesis

G follows a chi-squared distribution with degrees of freedom = (length of observations - 1)

14 of 33

Example of G-test in R

G_stat <- 2 * sum(observed * log(observed / expected), na.rm = TRUE)

dof <- length(observed) - 1

p_value <- pchisq(G_stat, df = dof)

alpha <- 0.05 # Significance level

if (p_value < alpha) {

cat("Reject the null hypothesis")

} else {

cat("Fail to reject the null hypothesis")

}

P-value = 0.59 (>0.05)

Thus, we fail to reject the null hypothesis that the there is statistically no significant difference between the frequencies observed in Mar 2019 - Mar 2023 follow the same distribution as the Feb 2015 - Feb 2019 ones”

15 of 33

Was the rare event statistically different in 4 years?

What is the probability of observing something as extreme?

Null hypothesis?

16 of 33

Was the rare event statistically different in 4 years?

What is the probability of observing entries as small as the one in April 2020?

Assume a poisson model

ꟛ = (sum of observations)/length(of observations)

P(X ≤ 3524) = ppois(x = 3524, lambda) < 1e-16 → The rare event is statistically different

Is this event a “rare” event?

17 of 33

A simpler case: Are trauma related deaths in 2020 similarly distributed as 2019?

Sum

88463

72503

df_wide$diff <- df_wide$`2020`-df_wide$`2019`

df_wide$chisq <- df_wide$diff^2/(df_wide$`2019`)

chi_square_stat <- sum(df_wide$chisq)

dof <- 11

p_value <- pchisq(chi_square_stat, dof, lower.tail = FALSE)

alpha <- 0.05 # Significance level

if (p_value < alpha) {

cat("Reject the null hypothesis")

} else {

cat("Fail to reject the null hypothesis")

}

Ideally, we should check if

(** this was automatically true for the 2015-2019 vs 2019 - 2023 example as we binned the observations)

18 of 33

A simpler case: Are trauma related deaths in 2020 similarly distributed as 2019?

Sum

88463

72503

chisq <- chisq.test(x = df_wide$`2020`, p = df_wide$`2019`, rescale.p = T)

> chisq$statistic

X-squared

2738.136

> chisq$p.value

[1] 0

# Method 1

72503

O_i

1

Probability from 2019

  • Since the assumption of number of deaths in 2020 != number of deaths in 2019, we first calculate the relative probability of deaths in each month 2019 (p_i)
  • p_i is then rescaled with total 2020 deaths to give E_i
  • Use chisq.test() to test 2020 values against p_i or explicitly calculate chisquare

df_wide$p_i <- df_wide$`2019`/sum(df_wide$`2019`)

df_wide$E_i <- df_wide$p_i * sum(df_wide$`2020`)

chisq_square_stat <- sum((df_wide$`2020`-df_wide$E_i)^2/df_wide$E_i)

dof <- 11

p_value <- pchisq(chi_square_stat, dof, lower.tail = FALSE)

> chisq_square_stat

[1] 2738.136

> p_value

[1] 0

# Method 2

19 of 33

How is G-test related to chi-squared test?

20 of 33

How is G-test (Likelihood ratio test) related to Chi-squared?

21 of 33

Central Limit Theorem

22 of 33

Binomial to Normal?

23 of 33

Binomial to Normal

24 of 33

Expectations and Variances

25 of 33

Expectations and Variances

26 of 33

Exercise - Calculate the mean of the binomial random variable

27 of 33

Some digression

28 of 33

Obesity and BMI - The old paradim

29 of 33

Obesity: requirement of the new definition

30 of 33

Obesity the new definition

31 of 33

dfdf

Testing for difference in mean (median) of two samples

32 of 33

Next: Testing for difference of means

Question: Is there statistically significant difference in mean between men and women BMI?

What is the null hypothesis?

Null Hypothesis: The mean bmi is same for men and womean

33 of 33

33

Questions?