1 of 32

CMSC 320

2026

Fardina F Alam

Hypothesis Testing: Different Types of Statistical Tests

(PART03)

2 of 32

Topics we will cover

Parametric Tests for Means: These tests are used to compare means. They assume that your data follows a normal distribution and that you have a sufficient sample size.

Z-Test
T-Test
One vs Two Sample Test
Paired Sample T-Test
Anova

Some Non-Parametrics Test

Chi-squared test

Post-Hoc Analysis

Different Types of Statistical Tests

3 of 32

Statistical Tests

A statistical hypothesis test is a method of statistical inference used to decide whether the data at hand sufficiently support a particular hypothesis.

Test statistics such as z-tests,one sample t-tests, chi-square tests, etc., are commonly used to assess hypotheses about population parameters based on sample data.

NOTE: Generally, in hypothesis tests, test statistic means to obtain all of the sample data and convert it to a single value. For example, Z-test calculates Z statistics, t-test calculates t-test statistic, and F-test calculates F values etc., are the test statistics. Test statistics need to compare to an appropriate critical value (cv) or p-value. A decision can then be made to reject or not reject the null hypothesis.

4 of 32

Does knowing more help us?

Yes! If we have an idea of the standard deviation of the underlying population (or even just have enough data to make an estimate), we can use a z-test instead, which give more accurate results.

According to the theory, we cannot use z-tests for sample sizes under 30 elements.

5 of 32

Z Test - When to use

The Z-test compares a sample mean to a population mean. It is used when :

population standard deviation (σ) is known (or can make a reasonable assumption about it) AND
Either you have a large sample size (typically n > 30) OR data is normally distributed.

Assumption: It assumes that the sample data is normally distributed or that the sample size is large enough for the Central Limit Theorem to apply

In summary, the choice between a Z-test and a t-test depends on the characteristics of your data and the known or assumed population standard deviation. If you have a large sample size and know the population standard deviation, you can use a Z-test. If you have a smaller sample size or don't know the population standard deviation, a t-test is more appropriate. Both tests are valuable tools for making statistical inferences in research and data analysis.

Don’t use a Z-test if:

σ is unknown → use t-test, especially for small samples

Z-Test – Key Assumptions

Known population standard deviation (σ)

Normal distribution of the population (or large n)

Random sampling

Independent observations

Interval or ratio scale (numeric data)

Summary: Z-test is valid when σ is known and either the sample is large or data is normal.

6 of 32

T-Test

T-test is a statistical test used to determine if there is a significant difference between the means of two groups.It is used when

population standard deviation (σ) is unknown AND
Either you have a smaller sample size (typically n < 30) OR even with large n, σ is still unknown

7 of 32

In General, the type of test statistic used depends on the number of samples being compared

One Sample: when there is only one sample that needs to be compared with a given value.
Two Samples, when there are two or more samples to be compared. In this case, tests can include correlation tests and tests for differences between samples.

Additionally, samples can be paired or not paired.

Paired samples are also called dependent samples (observations that are related or matched in some way), while not paired samples are also called independent samples (not related or matched).

8 of 32

One sample T-Test

Determine if the mean of a single sample is significantly different from a known or hypothesized population mean.

Commonly used when you have collected data from a single group or sample and want to compare its mean to a specific value or a hypothesized population mean.

How to run a one-sample t test:

import numpy as np

from scipy import stats

stats.ttest_1samp(your_data, popmean=0.5)

>>> TtestResult(statistic=2.456308468440, pvalue=0.017628209047638, df=49)

9 of 32

Two Sample t-test

The two sample t test, (also referred to as the unpaired t test), is used to compare the means of two different samples.

Example: We have noticed most humans fall into one of two distinct categories–male or female. We would like to know if our sample of males is taller than our sample of females.

Can we just take the average of the two samples?

10 of 32

Two-Sample T-test

NO! Simply taking the average of the two samples is not sufficient to determine if one group is taller than the other (does not account for variability within each group).
Instead, we would conduct a statistical test to determine whether there is a statistically significant difference between the two groups.

11 of 32

Two Sample T-Test

Null hypothesis: Men and women are the same height
Alternative hypothesis: Men and women are different heights
p-value: the probability that we would see these observations if the null hypothesis is true/correct

Q: What sort of p value would we see if men and women had different heights?

12 of 32

When can we use the T-test?

We can use the test for continuous data obtained from a random sample that follows a normal distribution.

Our Assumptions: For the t-test we assume:

The data is normally distributed
We care about the mean → ex: whether there is a significant difference in the means of the two groups.

Q: What if my data isn’t nearly normally distributed?

If your sample sizes are very small, you might not be able to test for normality. You might need to rely on your understanding of the data.
When you cannot safely assume normality, you can perform a nonparametric test that doesn’t assume normality.

13 of 32

Paired Sample t test

The paired sample t test is used to compare the means of two related groups of samples.

It is used in a situation where you have two values (i.e., a pair of values) for the same group of samples.
Often these two values are measured from the same samples either at two different times, under two different conditions, or after a specific intervention.

14 of 32

Paired Sample t test: Example

The aliens monitor a bunch of humans, test them for intelligence, and then run one half of them through a machine to make them smarter. Afterwards, they want to know if their machine worked.

This would be called a paired t-test.

Null Hypothesis: ?

Alternative Hypothesis: ?

15 of 32

Paired Sample t test: Example

The aliens monitor a bunch of humans, test them for intelligence, and then run one half of them through a machine to make them smarter. Afterwards, they want to know if their machine worked.

Null Hypothesis: The machine did nothing

Alternative Hypothesis: The machine came from a different distribution

16 of 32

Paired Sample t test: Example

The aliens monitor a bunch of humans, test them for intelligence, and then run one half of them through a machine to make them smarter. Afterwards, they want to know if their machine worked.

Null Hypothesis: The machine did nothing

Alternative Hypothesis: The machine came from a different distribution

Ques: The aliens get a p-value of .05. What can they conclude?

17 of 32

Recap: Tests so Far

One sample t-test: Tells how likely it is a single sample of normally distributed data would be generated by a specific mean
Two sample t-test: Tells how likely it is two samples would be generated by a population with the same mean
Chi-squared test: Estimates the chances two sets of categorical data come from the same distribution

18 of 32

Multiple Groups

The Aliens decide to kidnap humans to study, but we don’t know what humans eat! We have five different food mixes we want to try. We split the humans up into five groups and feed each group a different mix, and then measure how much the humans grow over the next few years.

Ques: How do Aliens know if the mixes have different effects?

19 of 32

Anova (Analysis of Variance) Test

ANOVA is a powerful statistical test for comparing the means of multiple groups (three or more groups (more than two)) to determine if there are significant differences among them.

We use a anova test.

Null hypothesis: There is no difference between any of the groups
Alternative hypothesis: There is a difference between at least one of the groups

Notes: In t-tests and z-tests, we typically compare means of two groups using individual datasets or assess the mean of a single group against a known value. ANOVA evaluates differences in means across three or more groups as a whole, considering both within-group and between-group variability.

20 of 32

Parametric Tests ( Comparing Means)

21 of 32

Nonparametric Tests ( Comparing Medians)

Nonparametric Hypothesis Tests Used when data do not meet assumptions of parametric tests (e.g., normality, equal variances).

Do not rely on population parameters like mean or variance.
Often based on ranking data instead of raw values.
Suitable for: Nominal or ordinal data, Skewed distributions, Small sample sizes

Example:

Rank-based tests → work on ranked data

Mann–Whitney U (2 independent groups)
Wilcoxon Signed-Rank (2 paired groups)
Kruskal–Wallis (3+ independent groups)
Spearman’s Rank Correlation�

Frequency-based test → work on counts in categories

Chi-Square (Goodness of Fit, Independence)

22 of 32

Nonparametric Tests: Alternatives to Parametric Tests

Kruskal–Wallis Test

Extension of the one-way ANOVA.
Compares medians of 3 or more independent groups.
Use when data are not normally distributed or variances are unequal.

Mann–Whitney U Test

Alternative to independent-samples t-test.
Compares two independent groups on median ranks.
Useful for ordinal data or non-normal distributions.

Wilcoxon Signed-Rank Test

Alternative to paired-samples t-test.
Compares two related/paired groups (e.g., before vs. after treatment).
Assesses differences in median ranks when normality is violated.

Spearman’s Rank Correlation: Measures correlation between two variables based on ranks. Useful when data are ordinal or non-normal

*** Nonparametric tests are mostly based on ranked data instead of raw values, making them more robust when assumptions of parametric tests are not met.

Instead of using the actual numerical values (raw scores), nonparametric tests convert the data into ranks (positions).�

Example: Raw data (exam scores) → 45, 80, 60, 90, 75�Ranked data (from smallest to largest) → 1, 2, 3, 4, 5

45 → Rank 1
60 → Rank 2
75 → Rank 3
80 → Rank 4
90 → Rank 5

So, the test looks at whether groups differ in their rank distributions, not the exact values.

Why? Because ranks are less sensitive to outliers and do not require the assumption that data are normally distributed.

23 of 32

The Chi-squared test (Frequency-based test for categorical data)

Analyze categorical data to check for an association or relationship between two or more categorical variables.�

Type: Nonparametric test (compares frequencies, not raw values)�

When to use: To determine if observed frequencies differ significantly from expected frequencies in a contingency table.�

Example: Is there a relationship between gender and preference for a soda brand (Yes/No)?

Use Chi-Square to test if soda preference is independent of gender.

24 of 32

What about this?

We are monitoring birds from two different places on the planet, and get the following results:

Bird Type	Location A	Location B
Grackle	7	13
Pigeon	2	7
Sea pigeon	15	1
One of those big fish-beak things	13	0
Big long bird	22	0
Bat	3	4

Each bird type and location falls into distinct categories, making them categorical variables suitable for analysis using methods like the chi-square test.

We want to find out if two different places on Earth have the same types of birds

25 of 32

Do these locations have the same underlying bird population?

Enter the Chi Square Test! A test for checking if two sets of categorical variables come from the same distribution.

Null hypothesis: ?

Alternative hypothesis: ?

The bird populations observed in Location A and Location B are the same.

The bird populations observed in Location A and Location B are different.

26 of 32

Considerations: (How to decide an appropriate statistical test?)

What are you curious about?

Mean? Standard deviation? Frequency?

Is your data categorical or continuous?
Do you have one or two samples?
Is your data normally distributed?

If it is, you would use a parametric test. If it is very non-normal, you would use a non-parametric test

Is your data paired? Is there a before and after?

27 of 32

Post Hoc Tests for ANOVA

28 of 32

When to use: ANOVA tells us that at least one group differs, but not which groups are different.

Post-hoc analysis identifies the specific group differences after ANOVA is significant when more than two groups are compared.

Purpose

Compare pairs of groups�Control error from multiple comparisons
Identify exactly where differences occur

Post Hoc Test/ Analysis

Common Post-Hoc Tests

Tukey’s Honest Significant Difference (HSD)
Bonferroni Correction
Duncan’s Multiple Range Test�

Key Idea Post-hoc tests show which group means differ significantly from each other.

29 of 32

Example

Scenario: We conducted a study to compare the test scores of students from three different schools: School A, School B, and School C.

More than 2 groups (A,B,C) → Apply ANOVA

ANOVA result: p-value < 0.05 (indicating significant overall differences among schools).

30 of 32

Example

Next Step: Applied Tukey's HSD post hoc test.

Interpretation from ANOVA: School A and School B, as well as School A and School C, have significantly different test scores (p < 0.05). But there is no significant difference in test scores between School B and School C.

31 of 32

Summary:

Statistical tests let us reason about one more samples and how they relate to each other and the population.

z-test: A test we can use if your data is normally distributed and we have a large number of samples
t-test: A test we can use if your data is normally distributed and we have a small number of samples
Paired tests: Used when we follow specific population members through time
One tailed vs two tailed tests: Two tailed tests look for any difference in population; one tailed tests require us to pick a direction

32 of 32

Summary: Main Steps of Hypothesis Testing

State the Null Hypothesis: Assumption what you're trying to test
State the Alternative Hypothesis: what you believe might be true
Pick a Level of Significance 𝛂: the probability of rejecting the null hypothesis when it's actually true. Common values for α are 0.05 or 0.01.
Choose a Test: Select the right tool to check your guesses based on your data.
Collect Data: Get the information/data you need through observation or experimentation.
Calculate a test statistic: Using the collected data, calculate the appropriate test statistic.
Calculate P-Value and compare with 𝛂: Based on the comparison, decide if you have enough evidence to believe your guess is right or if you need to keep looking.
Draw a Conclusion: Based on your decision in the previous step, draw a conclusion regarding the null hypothesis. If you reject it, you accept the alternative hypothesis. If you fail to reject it, you do not have enough evidence to support the alternative hypothesis.