1 of 37

SAMPLE SIZE CALCULATION

Basic concepts and simplified guide

By

Dr. Mariam Alsayd Awad

Assistant Lecturer of Public Health and Community Medicine/ Damietta Faculty of Medicine/ Al-Azhar University

2 of 37

OUTLINES

  • Importance of sample size estimation
  • When not to calculate sample size?
  • Information needed for sample size calculation
  • Approaches and procedures for sample size calculation
  • Sample size calculation for different study designs

3 of 37

  • Sample size calculation is one of the first steps in designing a clinical study.
  • The sample size is the number of patients or other investigated units that will be included in a study and required to answer the research hypothesis in the study.

4 of 37

Availability of resources sets upper limit of the sample size.

Required accuracy sets lower limit of sample size.

5 of 37

IMPORTANCE OF SAMPLE SIZE ESTIMATION

  • When a representative sample is taken from a population, the finding are generalized to the population.
  • Optimum sample size determination is required for the following reasons:
  • To allow appropriate analysis.
  • To provide desired level of accuracy.
  • To allow validity to the significance test.

6 of 37

  • If the sample is too small:
  • It will not have enough power to answer the research question
  • It might have a negative result and a true difference may be missed (due to the low power).
  • If the sample size is too large:
  • It will be waste of time and resources
  • It may be unethical as more people are exposed to dangers and side effects.
  • Hence, optimum sample size is an essential component of any research

7 of 37

WHEN NOT TO CALCULATE SAMPLE SIZE?

  1. Limited resources: human, funds, technology, and time
  2. Fixed number of subjects available to researchers
  3. Novelty of study: unknown population variability
  4. Pilot (feasibility) studies
  5. Case report/case series

8 of 37

BASIC INFORMATION NEEDED FOR SAMPLE SIZE�CALCULATION

  • The needed information differs according to the study design but generally we need to answer the following questions:

1. What is the type of study?

      • Single sample (prevalence survey)
      • Comparison of two groups (cross-sectional, case-control, cohort study)
      • Clinical trial

2. What is the main (primary) outcome?

      • Proportion
      • Association
      • Mean of a measurement

9 of 37

BASIC INFORMATION NEEDED FOR SAMPLE SIZE�CALCULATION

3. What is the expected variability between the subjects?

      • Which can be obtained from a pilot study or the reported variation in the previous studies.

10 of 37

FROM THE ABOVE QUESTIONS WE GET THE FOLLOWING BASIC CONCEPTS IN SAMPLE SIZE CALCULATION

  1. Confidence interval
  2. Power
  3. Significance level = alpha error
  4. Effect size

11 of 37

CONFIDENCE INTERVAL

  • True Population value: The actual value of a population parameter (e.g. prevalence). This is what investigators wish to capture by conducting studies.
  • Confidence Interval: A range of values likely to contain the true value of the  population parameter of interest.

12 of 37

POWER

  • This is the probability that the test will correctly identify a significant difference, effect or association.
  • Sample size is directly proportional to the power of the study.The larger the sample size, the study will have greater power to detect significant difference, effect or association.
  • The power of the study is usually set at 80% or 90%.
  • The power of study is the probability of not committing type Il error (not having a false negative result).

Power = 1-β

13 of 37

SIGNIFICANCE LEVEL (ALPHA ERROR = TYPE I ERROR)

  • Probability of finding significance where there is none
  • False positive
  • Probability of Type I error
  • Usually set to 0.05

14 of 37

TYPICAL VALUES FOR SIGNIFICANCE LEVEL AND POWER

15 of 37

EFFECT SIZE

  • ls a measure of the strength of the relationship between two variables in a population.
  • The bigger the size of the effect in the population, the easier it will be to find out.
  • Examples:
    • The correlation between two variables (r²): r =0.1 weak, r =0.5 moderate, r =0.7 strong, r =0.9 very strong
    • The mean differences in t tests (use Cohen's D): d=0.2 is small, d=0.5 is medium, d=0.8 is large

16 of 37

APPROACHES AND PROCEDURES FOR SAMPLE SIZE CALCULATION

  • Depends primarily on:
    1. The study design
    2. The main outcome measure of the study
  • There are 3 procedures that could be used for calculating sample size:
    • Formulae
    • Ready made tables
    • Computer softwares (Free & paid) or websites

17 of 37

SAMPLE SIZE CALCULATION FOR DIFFERENT STUDY DESIGNS

  1. Prevalence study (Cross sectional survey)
  2. Comparative cross-sectional study and Cohort
  3. Case-control
  4. Clinical trial

18 of 37

PREVALENCE STUDY (CROSS SECTIONAL SURVEY)

  • Needed information:
    1. Size of the population
    2. Expected prevalence (from previous studies, pilot study or expert opinion)
    3. Allowed margin of error (usually between 3 to 5)
  • Calculated using:
    • Formula (Kish L. 1965)
    • Reference table
    • StatCalc from EpiInfo7
    • openepi.com/SampleSize

19 of 37

FORMULA (KISH L. 1965)

  • Where:
  • Z= 1.96 which is the standard value for confidence interval of 95% (from normal distribution table).
  • P= The expected prevalence of the outcome
  • d= maximum random sampling error taken at 0.05 (absolute precision= 5 percentage points)

20 of 37

FORMULA (KISH L. 1965)

  •  

21 of 37

Refer to a Table in S.K. Lwanga, S. Lemeshaw 1991, Sample Size Determination in Health Studies, pg 25.

22 of 37

  • StatCalc from EpiInfo7.

23 of 37

  • openepi.com/SampleSize

24 of 37

NOTES

  • What If There Is No Prior Information?
    • Instead of saying: “Sample sizes are not provided because there is no prior information” do this instead;
    • Conduct small pre-study
    • Assume that the prevalence is 50% since that will give you the largest required sample size.

25 of 37

COMPARATIVE CROSS-SECTIONAL AND COHORT STUDIES

  • What is the main objective of the study?
    • Association between the major risk factor being studied and the outcome.
  • Needed information:
    1. The proportion of exposed vs non-exposed.
    2. Expected percentage of the outcome in each group (or the expected RR)
  • Calculated using:
    • StatCalc from EpiInfo7.
    • openepi.com/SampleSize

26 of 37

  • Example:

I want to prove that Indians (ethnicity = risk) are at higher risk of having diabetes mellitus (outcome) compared to other races in Malaysia, using a cross-sectional study.

From literature review, identify the rate of disease and proportion of those with the risk factor.

      • Proportion of sample from unexposed (Others) = 85%
      • Proportion of sample from exposed (Indians) = 15%
      • P1=true proportion of DM in unexposed (Others) = 8%
      • P2=true proportion of DM in exposed (Indians) =14%

27 of 37

  • Example:

I want to prove that Indians (ethnicity = risk) are at higher risk of having diabetes mellitus (outcome) compared to other races in Malaysia, using a cross-sectional study.

From literature review, identify the rate of disease and proportion of those with the risk factor.

      • Proportion of sample from unexposed (Others) = 85%
      • Proportion of sample from exposed (Indians) = 15%
      • P1=true proportion of DM in unexposed (Others) = 8%
      • P2=true proportion of DM in exposed (Indians) =14%

28 of 37

CASE-CONTROL STUDIES

  • In a case-control study, you identify the cases and controls. Then you compare the rate of exposure/risk factor between the case and control group.
  • Needed information:
    1. The ratio of controls to cases (In most studies, it is 1:1, which means an equal size of the case and control groups).
    2. Expected percentage of exposed persons in each group (cases and control groups), or the expected OR.
  • Calculated using:
    • StatCalc from EpiInfo7.
    • openepi.com/SampleSize

29 of 37

  • For Example,

You want to prove that cataract patients (cases) have a higher rate of diabetes mellitus (risk factor) compared to patients with normal vision (controls).

From literature review, identify the rate of exposure among the cases (i.e. 50%) and among the controls (i.e. 8%). Decide on the ratio; i.e. 1:1

    • Proportion of sample from controls (Normal) population = 50%
    • Proportion of sample from cases (Cataract) population = 50%
    • P1=true proportion of DM in controls (Normal) population = 8%
    • P2=true proportion of DM in cases (Cataract) population =50%

30 of 37

  • For Example,

You want to prove that cataract patients (cases) have a higher rate of diabetes mellitus (risk factor) compared to patients with normal vision (controls).

From literature review, identify the rate of exposure among the cases (i.e. 50%) and among the controls (i.e. 8%). Decide on the ratio; i.e. 1:1

    • Proportion of sample from controls (Normal) population = 50%
    • Proportion of sample from cases (Cataract) population = 50%
    • P1=true proportion of DM in controls (Normal) population = 8%
    • P2=true proportion of DM in cases (Cataract) population =50%

31 of 37

CLINICAL TRIAL

  • Needed information:
    1. Power of the study: usually set at 80% or 90%.
    2. Confidence interval: This level is set at 95% in most trials
    3. Enrollment ratio: it means the ratio of participants in the control group to the treatment group. In most studies, it is 1:1
    4. Expected effect size (the minimum clinically important difference): this is the most important item that we need to define. The expected effect size is the smallest difference between the two groups that is of clinical importance (the amount of difference we want our study to be able to detect).

32 of 37

  • Calculated using:
  • Formula & Reference table
  • Software or websites

1. The standardized difference is calculated as;

    • Where:
    • (s.d) is the standard deviation of the variable
    • (δ) is the clinically relevant difference (expected effect size)
    • Then refer to the following table.
  • Example:
  • If difference between means = 10 mmHg
  • pop. standard deviation = 20 mm Hg
  • Then
  • Standardized difference= 10 mm Hg/20 mm Hg = 0.5
  • Total sample size = 64

33 of 37

  1. Software or websites:

Results from OpenEpi, Version 3, open source calculator--SSMean

34 of 37

Software or websites:

https://www.cdc.gov/epiinfo/index.html

35 of 37

ADJUSTMENT FOR LOSS OF FOLLOW UP

  • The calculated sample size should be adjusted for the expected loss of follow-up cases.
  • For example, if the calculated sample size is 80 in total and it is expected that 20% of those recruited will not complete the study, then 100 patients should be recruited to ensure that 80 will complete it.
  • It is easily calculated by dividing the calculated sample size by (1- proportion expected to be lost).
  • If the calculated sample size is 150 per group and the expected loss to follow-up is 15%, we can calculate the sample size after adjustment as: 150/0.85 = 176.5. This is rounded up to 177 individuals per group.

36 of 37

To Sum Up

  • Mean Difference

Power of the study: set at 80% or 90%.

Confidence interval: set at 95% Enrollment ratio: 1:1

Expected effect size or Mean±SD

  • Association
  • The ratio of controls to cases
  • OR / RR

  • Proportion
  • Population size
  • Expected prevalence
  • Margin of error

The crucial first step is to determine the study design

Then, the main objectives (Study outcomes)

  • Proportion
  • Association
  • Mean Difference

37 of 37

THANK YOU