1 of 32

���M.S.S. in Health EconomicsHE 604: Public Health and Epidemiology (PHE)�Sampling Techniques

Dr. Aninda Nishat Moitry

MBBS, MPH, MSc, FRSPH

05 March 2023

2 of 32

Overview of presentation

  • Basic concepts of sampling
  • Non-probability sampling
  • Probability sampling
  • Sampling in randomized trials

3 of 32

Basic concepts of statistical�inference

  • To answer a research question about a specific population, we, ideally should get information on the entire population
  • Using a sample to make inferences about a population is the most vital aspect of epidemiologic research
  • Inference are based on the process of taking a single random sample of a specific size from a population and using this sample to make judgments about the population as a whole
  • These judgments are made in terms of means, variances or other summarizing statistics
  • Summarizing numbers for the population are called parameters and are represented by Greek letters such as:

μ = mean,

σ= standard deviation and

s = regression coefficient

4 of 32

Population

  • All members of a group for which information is desired or sought
  • Defined unambiguously in relation to other characteristics such as time, geographic or other boundaries or characteristics
  • Living being such as humans, animals, plants and microorganism or non living documents, times, events and places

5 of 32

Census

  • Information collected from every member of a population
  • Gives us complete and unbiased results
  • Costly, time consuming and often hard to maintain the quality
  • May not be feasible at times

6 of 32

Sample

  • Part of a population from which information is obtained in order to derive an estimate of the population parameter
  • One or more sample unit are selected to get a sample

7 of 32

Some definitions

  • Target Population:
  • Population to which the results will be generalized
  • Sampling Unit:
  • Smallest unit from which sample can be selected
  • Sampling frame (SF):

- List of sampling units from which sample is drawn

  • Sampling scheme:

- Method of selecting sampling units from SF

8 of 32

Population, Sampling frame, and�Sample

9 of 32

The rationale

  • Why we do sample:
  • Don’t have enough resource
  • Not feasible to study all of a population
  • To ensure optimum quality
  • What we do
  • We estimate parameter from statistics
  • Statistics
  • Numerical characteristics of a sample

10 of 32

Types of Sampling

  • Non-probability samples
  • Probability samples

11 of 32

Non-Probability Sampling

  • Any sampling method where some elements of population have no chance of selection
  • Non‐probability sampling saves time, money and other resources but lacks external validity and information credibility
  • Characteristics:
  • Probability of being chosen is unknown
  • Cheaper- but unable to generalise
  • Potential for bias

12 of 32

Non Probability Sampling

13 of 32

Non Probability Sampling

  • Purposive sampling:
  • Respondents who would serve the purpose are selected as sample
  • Selection of the sample is heavily dependent on the judgment of the interviews
  • May not have fixed sample size nor any sampling design and it is the researcher who would select the sample considering whether the sample would serve the study
  • Used during developing questionnaires, pilot studies and qualitative research such as case studies, key informant interview etc.

14 of 32

Non Probability Sampling (Contd.)

  • Convenient sample:
  • Find some people easy to find
  • Snowball sample:
  • Fine a few people relevant for your topic
  • Ask them to refer you more
  • Snowball sampling works well to study ‘hard to find’ populations and to explore sensitive issues such as
  • Impact of male homosexuality on AIDS, impact of gambling or drug addiction in a conservative society
  • Quota sampling
  • Determine the population with some characteristics
  • Create quota based on those characteristics
  • Select people for each quota

15 of 32

What is Probability Sampling?

  • Sampling:
  • The act of taking a sample a process of selecting what/who will participate in a study
  • You want this sample to be representative of the population as a whole so that your results are generalizable
  • There is a known probability of every sample being selected

16 of 32

What is Probability Sampling? (Contd.)

  • Random sampling: Each subject has a known probability of being selected
  • Simple random sampling
  • Systematic sampling
  • Two-stage/multistage sampling (cluster sampling)
  • Stratified sampling
  • Allows application of statistical sampling theory to results to:
  • Generalize
  • Test hypotheses

17 of 32

18 of 32

Simple Random Sampling

  • The most basic sampling design
  • Randomization based on a single sequence of assignments
  • Basic method are:
  • Flipping a coin
  • Throwing a dice
  • Random number table or
  • Computer-generated random numbers
  • Applicable when population is small, homogeneous & readily available
  • Needs a complete list of target population
  • Assign a number to each of the units in a population and then use a random number generator
  • Each element thus has an equal probability of selection

19 of 32

Challenges of SRS

  • Can be problematic in relatively small sample size clinical trials resulting in an unequal number of participants among groups
  • Minority subgroups of interest in population may not be present in sample in sufficient numbers for study

20 of 32

Stratified Sampling

  • This method addresses the need to control and balance the influence of covariates
  • First, specific covariates are identified by the researcher
  • Second, a separate block for each combination of covariates is generated
  • Then, participants are assigned to the appropriate block of covariates
  • Then, simple randomization is used within each block to assign participants to one of the groups
  • When population is heterogeneous, break up the population into homogenous non overlapping groups (strata) before sampling
  • Two primary reasons for using a stratified sampling design:
  • To potentially improve representativeness, in terms of the stratifying variables, by gaining greater control over the composition of the sample
  • To ensure that particular groups within a population are adequately represented in the sample

21 of 32

Example of Stratified Sampling

22 of 32

Challenges of Stratified Sampling

  • Sampling frame of entire population has to be prepared separately for each stratum
  • When examining multiple criteria, stratifying variables may be related to some, but not to others, further complicating the design, and potentially reducing the utility of the strata
  • In case of designs with a large number of strata, or those with a specified minimum sample size per group), stratified sampling can potentially require a larger sample than would other methods

23 of 32

Systematic Sampling

  • Sampling fraction: Ratio between sample size and population size
  • Every Kth unit is selected for inclusion in the sample
  • Empirically identical to simple random sampling
  • Used when probability sampling techniques are not applicable (no precise sampling frame)
  • Involves a random start and then proceeds with the selection of every kth element onwards. k=(population size/sample size)
  • The starting point is not automatically the first in the list, but instead randomly chosen from within the first to the kth element in the list
  • A simple example would be to select every 10th name from the telephone directory (an 'every 10th' sample, also referred to as 'sampling with a skip of 10')

24 of 32

Cluster sampling

  • Also known as group randomised trials or community randomised trials
  • Most appropriate design to assess the community-level impact of an intervention
  • Clusters of people or intact social units, rather than individuals are randomised
  • Primary Sampling Units– the clusters
  • Secondary Sampling Units- individuals/ households
  • Outcomes can be measured both at individuals or community level within those clusters
  • Creating a “buffer zone "is important

25 of 32

Cluster sampling: A visual example

26 of 32

Types of cluster sampling

  • One-stage cluster sample:
  • List all the clusters in the population
  • Select the clusters usually (with simple random sampling)
  • All elements in the sampled clusters are selected for the survey
  • Two-stage cluster sample:
  • List all the clusters in the population
  • First, select the clusters, usually by simple random sampling
  • A subset of the elements in the selected clusters are then sampled in the second-stage (with simple random sampling or systematic random sampling)
  • Multi stage cluster
  • A more complicated form of cluster sampling in which larger clusters are further subdivided into smaller clusters
  • A total population of interest is first divided into “clusters”
  • Then, those divide these first-stage clusters are further divided into second-stage cluster using a second element
  • Thus, it creates a more representative sample of the population

27 of 32

Pros and Cons of Cluster Sampling

  • Advantages:
  • Generating sampling frame for clusters is economical
  • Less time for listing and implementation
  • Also suitable for survey of institutions
  • Disadvantages:
  • May not reflect the diversity of the community
  • Provides less information per observation than an SRS of the same size
  • Standard errors of the estimates are high
  • Standard approaches to sample size estimation and analysis no longer apply

28 of 32

Randomization in Experimental Studies

29 of 32

Block Randomization

  • The researcher divide the sample into relatively homogeneous subgroups or blocks
  • Then, the experimental design is implemented within each block
  • The key idea is that the variability within each block is less than the variability of the entire sample
  • Although balance in sample size may be achieved with this method, groups may be generated that are rarely comparable in terms of certain covariates

30 of 32

Covariate Adaptive Randomization

  • A new participant is sequentially assigned to a particular treatment group by taking into account
  • The specific covariates
  • Previous assignments of participants
  • Here the method of “minimization” is used by assessing the imbalance of sample size among several covariates
  • This style allows the researcher to make a case-by-case decision on group assignment

31 of 32

32 of 32

Thank You