1 of 37

SAMPLE SIZE CALCULATION

Basic concepts and simplified guide

Dr. Mariam Alsayd Awad

Assistant Lecturer of Public Health and Community Medicine/ Damietta Faculty of Medicine/ Al-Azhar University

2 of 37

OUTLINES

Importance of sample size estimation
When not to calculate sample size?
Information needed for sample size calculation
Approaches and procedures for sample size calculation
Sample size calculation for different study designs

3 of 37

Sample size calculation is one of the first steps in designing a clinical study.
The sample size is the number of patients or other investigated units that will be included in a study and required to answer the research hypothesis in the study.

4 of 37

Availability of resources sets upper limit of the sample size.

Required accuracy sets lower limit of sample size.

5 of 37

IMPORTANCE OF SAMPLE SIZE ESTIMATION

When a representative sample is taken from a population, the finding are generalized to the population.
Optimum sample size determination is required for the following reasons:
To allow appropriate analysis.
To provide desired level of accuracy.
To allow validity to the significance test.

6 of 37

If the sample is too small:
It will not have enough power to answer the research question
It might have a negative result and a true difference may be missed (due to the low power).
If the sample size is too large:
It will be waste of time and resources
It may be unethical as more people are exposed to dangers and side effects.
Hence, optimum sample size is an essential component of any research

7 of 37

WHEN NOT TO CALCULATE SAMPLE SIZE?

Limited resources: human, funds, technology, and time
Fixed number of subjects available to researchers
Novelty of study: unknown population variability
Pilot (feasibility) studies
Case report/case series

8 of 37

BASIC INFORMATION NEEDED FOR SAMPLE SIZE�CALCULATION

The needed information differs according to the study design but generally we need to answer the following questions:

1. What is the type of study?

Single sample (prevalence survey)
Comparison of two groups (cross-sectional, case-control, cohort study)
Clinical trial

2. What is the main (primary) outcome?

Proportion
Association
Mean of a measurement

9 of 37

BASIC INFORMATION NEEDED FOR SAMPLE SIZE�CALCULATION

3. What is the expected variability between the subjects?

Which can be obtained from a pilot study or the reported variation in the previous studies.

10 of 37

FROM THE ABOVE QUESTIONS WE GET THE FOLLOWING BASIC CONCEPTS IN SAMPLE SIZE CALCULATION

Confidence interval
Power
Significance level = alpha error
Effect size

11 of 37

CONFIDENCE INTERVAL

True Population value: The actual value of a population parameter (e.g. prevalence). This is what investigators wish to capture by conducting studies.
Confidence Interval: A range of values likely to contain the true value of the population parameter of interest.

12 of 37

POWER

This is the probability that the test will correctly identify a significant difference, effect or association.
Sample size is directly proportional to the power of the study.The larger the sample size, the study will have greater power to detect significant difference, effect or association.
The power of the study is usually set at 80% or 90%.
The power of study is the probability of not committing type Il error (not having a false negative result).

Power = 1-β

13 of 37

SIGNIFICANCE LEVEL (ALPHA ERROR = TYPE I ERROR)

Probability of finding significance where there is none
False positive
Probability of Type I error
Usually set to 0.05

14 of 37

TYPICAL VALUES FOR SIGNIFICANCE LEVEL AND POWER

15 of 37

EFFECT SIZE

ls a measure of the strength of the relationship between two variables in a population.
The bigger the size of the effect in the population, the easier it will be to find out.
Examples:

The correlation between two variables (r²): r =0.1 weak, r =0.5 moderate, r =0.7 strong, r =0.9 very strong
The mean differences in t tests (use Cohen's D): d=0.2 is small, d=0.5 is medium, d=0.8 is large

16 of 37

APPROACHES AND PROCEDURES FOR SAMPLE SIZE CALCULATION

Depends primarily on:

The study design
The main outcome measure of the study

There are 3 procedures that could be used for calculating sample size:

Formulae
Ready made tables
Computer softwares (Free & paid) or websites

17 of 37

SAMPLE SIZE CALCULATION FOR DIFFERENT STUDY DESIGNS

Prevalence study (Cross sectional survey)
Comparative cross-sectional study and Cohort
Case-control
Clinical trial

18 of 37

PREVALENCE STUDY (CROSS SECTIONAL SURVEY)

Needed information:

Size of the population
Expected prevalence (from previous studies, pilot study or expert opinion)
Allowed margin of error (usually between 3 to 5)

Calculated using:

Formula (Kish L. 1965)
Reference table
StatCalc from EpiInfo7
openepi.com/SampleSize

19 of 37

FORMULA (KISH L. 1965)

Where:
Z= 1.96 which is the standard value for confidence interval of 95% (from normal distribution table).
P= The expected prevalence of the outcome
d= maximum random sampling error taken at 0.05 (absolute precision= 5 percentage points)

20 of 37

FORMULA (KISH L. 1965)

21 of 37

Refer to a Table in S.K. Lwanga, S. Lemeshaw 1991, Sample Size Determination in Health Studies, pg 25.

22 of 37

StatCalc from EpiInfo7.

23 of 37

openepi.com/SampleSize

24 of 37

NOTES

What If There Is No Prior Information?

Instead of saying: “Sample sizes are not provided because there is no prior information” do this instead;
Conduct small pre-study
Assume that the prevalence is 50% since that will give you the largest required sample size.

25 of 37

COMPARATIVE CROSS-SECTIONAL AND COHORT STUDIES

What is the main objective of the study?

Association between the major risk factor being studied and the outcome.

Needed information:

The proportion of exposed vs non-exposed.
Expected percentage of the outcome in each group (or the expected RR)

Calculated using:

StatCalc from EpiInfo7.
openepi.com/SampleSize

26 of 37

Example:

I want to prove that Indians (ethnicity = risk) are at higher risk of having diabetes mellitus (outcome) compared to other races in Malaysia, using a cross-sectional study.

From literature review, identify the rate of disease and proportion of those with the risk factor.

Proportion of sample from unexposed (Others) = 85%
Proportion of sample from exposed (Indians) = 15%
P1=true proportion of DM in unexposed (Others) = 8%
P2=true proportion of DM in exposed (Indians) =14%

27 of 37

Example:

I want to prove that Indians (ethnicity = risk) are at higher risk of having diabetes mellitus (outcome) compared to other races in Malaysia, using a cross-sectional study.

From literature review, identify the rate of disease and proportion of those with the risk factor.

Proportion of sample from unexposed (Others) = 85%
Proportion of sample from exposed (Indians) = 15%
P1=true proportion of DM in unexposed (Others) = 8%
P2=true proportion of DM in exposed (Indians) =14%

28 of 37

CASE-CONTROL STUDIES

In a case-control study, you identify the cases and controls. Then you compare the rate of exposure/risk factor between the case and control group.
Needed information:

The ratio of controls to cases (In most studies, it is 1:1, which means an equal size of the case and control groups).
Expected percentage of exposed persons in each group (cases and control groups), or the expected OR.

Calculated using:

StatCalc from EpiInfo7.
openepi.com/SampleSize

29 of 37

For Example,

You want to prove that cataract patients (cases) have a higher rate of diabetes mellitus (risk factor) compared to patients with normal vision (controls).

From literature review, identify the rate of exposure among the cases (i.e. 50%) and among the controls (i.e. 8%). Decide on the ratio; i.e. 1:1

Proportion of sample from controls (Normal) population = 50%
Proportion of sample from cases (Cataract) population = 50%
P1=true proportion of DM in controls (Normal) population = 8%
P2=true proportion of DM in cases (Cataract) population =50%

30 of 37

For Example,

You want to prove that cataract patients (cases) have a higher rate of diabetes mellitus (risk factor) compared to patients with normal vision (controls).

From literature review, identify the rate of exposure among the cases (i.e. 50%) and among the controls (i.e. 8%). Decide on the ratio; i.e. 1:1

Proportion of sample from controls (Normal) population = 50%
Proportion of sample from cases (Cataract) population = 50%
P1=true proportion of DM in controls (Normal) population = 8%
P2=true proportion of DM in cases (Cataract) population =50%

31 of 37

CLINICAL TRIAL

Needed information:

Power of the study: usually set at 80% or 90%.
Confidence interval: This level is set at 95% in most trials
Enrollment ratio: it means the ratio of participants in the control group to the treatment group. In most studies, it is 1:1
Expected effect size (the minimum clinically important difference): this is the most important item that we need to define. The expected effect size is the smallest difference between the two groups that is of clinical importance (the amount of difference we want our study to be able to detect).

32 of 37

Calculated using:
Formula & Reference table
Software or websites

1. The standardized difference is calculated as;

Where:
(s.d) is the standard deviation of the variable
(δ) is the clinically relevant difference (expected effect size)
Then refer to the following table.

Example:
If difference between means = 10 mmHg
pop. standard deviation = 20 mm Hg
Then
Standardized difference= 10 mm Hg/20 mm Hg = 0.5
Total sample size = 64

33 of 37

Software or websites:

Results from OpenEpi, Version 3, open source calculator--SSMean

34 of 37

Software or websites:

Free software like:

G*Power software https://www.psychologie.hhu.de/arbeitsgruppen/allgemeine-psychologie-und-arbeitspsychologie/gpower.htm|
PS: Power and Sample Size Calculations https://ps-power-and-sample-size-
Epi Info™

https://www.cdc.gov/epiinfo/index.html

35 of 37

ADJUSTMENT FOR LOSS OF FOLLOW UP

The calculated sample size should be adjusted for the expected loss of follow-up cases.
For example, if the calculated sample size is 80 in total and it is expected that 20% of those recruited will not complete the study, then 100 patients should be recruited to ensure that 80 will complete it.
It is easily calculated by dividing the calculated sample size by (1- proportion expected to be lost).
If the calculated sample size is 150 per group and the expected loss to follow-up is 15%, we can calculate the sample size after adjustment as: 150/0.85 = 176.5. This is rounded up to 177 individuals per group.

36 of 37

To Sum Up

Mean Difference

Power of the study: set at 80% or 90%.

Confidence interval: set at 95% Enrollment ratio: 1:1

Expected effect size or Mean±SD

Association
The ratio of controls to cases
OR / RR

Proportion
Population size
Expected prevalence
Margin of error

The crucial first step is to determine the study design

Then, the main objectives (Study outcomes)

Proportion
Association
Mean Difference