1 of 18

Definitions

  • Population: A population is an entire group, collection or space of objects which we want to characterize.

  • Sample: A sample is a collection of observations on which we measure one or more characteristics. Frequently, we use (small) samples of (large) populations to characterize the properties and affinities within the space of objects in the population of interest. For example, if we want to characterize the US population, we can take a sample (poll or survey) and the summaries that we obtain on the sample (e.g., mean age, race, income, body-weight, etc.) may be used to study the properties of the population, in general. The sample should be representitive of the population and unbiased. A biased sample is one in which the method used to create the sample results in samples that are systematically different from the population. When the sample does not represent the population, it is called unrepresentative. This doesnot happen if every subject in the population has equal chance to be selected in the sample. The type of bias that occurs in statistics when there is an unrepresentative sample is called selection bias, which would result in overestimation or under estimation of the population parameter

2 of 18

Definitions

  • A statistical inference is the procedure by which we reach a conclusion about a population on the basis of information contained in a sample that has been drawn from that population.

  • there are many kinds of samples that may be drawn from a population. The simplest type of scientific samples that may be used to draw inferences is the simple random sample.

3 of 18

Definitions

  • If you use the letter N to designate the size of a finite population and the letter n to designate the size of sample then:

If a sample of size n is drawn from a population of size N in such a way that every possible sample of size n has the same chance of being selected, the sample is called a simple random sample.

4 of 18

Choosing samples

  • Simple Random Sample (SRS):
    • Here a random sample is one in which each individual (object) in the population to be sampled has equal chance to be selected.
    • Units can be numbered and numbers are selected randomly (see the figure).

5 of 18

6 of 18

Choosing samples

  • Systematic sampling
    • It is a process by which every nth object is selected. Consider a mailing list for a survey. The list is too large for us to mail to everyone in this population. Therefore, we select every 6th or 10th name from the list to reduce the size of the mailing while still sampling across the entire list (A-Z).
    • In the pharmaceutical industry this might be done during a production run of a certain tablet where at selected time periods (every 30 or 60 minutes) tablets are randomly selected as they come off the tablet press and weighed to ensure the process is within control specifications.
    • In this production example, the time selected during the hour can be randomly chosen in an attempt to detect any periodicity (regular pattern) in the production run.

7 of 18

8 of 18

Choosing samples

  • Stratified sampling
    • the population is divided into groups (strata) with similar characteristics and then individuals or objects can be randomly selected from each group according to its % in the whole population.

9 of 18

Choosing samples

  • For example, in a study we may wish to ensure a certain percentage of smokers (25%) are represented in both the control and experimental groups in a clinical trial (n=100 per group).
  • First the volunteers are stratified into smokers and non-smokers.
  • Then, 25 smokers are randomly selected for the experimental group and an additional 25 smokers are randomly selected as controls.
  • Similarly two groups of 75 non-smoking volunteers are randomly selected to complete the study design.
  • Stratified sampling is recommended when the strata are very different from each other and all of the objects or individuals within each stratum are similar.

10 of 18

Choosing samples

  • Cluster sampling "multistage" sampling
    • Is employed when there are many individual "primary" units that are clustered together in "secondary", larger units that can be subsampled.
    • For example, individual tablets (primary) are contained in bottles (secondary) sampled at the end of a production run.
    • For example, assume that 150 containers of a bulk powder chemical arrive at a pharmaceutical manufacturer and the quality control laboratory needs to sample these for the accuracy of the chemical or lack of contaminants.
    • Rather than sampling each container we randomly select ten containers. Then within each container of the ten containers we further extract random samples (from the top, middle bottom) to be assayed.

11 of 18

Data Organization

  • Measurements that have not been

organized, summarized or otherwise

manipulated are called raw data.

• Unless the number of observations is

extremely small, it will be unlikely that

these raw data will impart much

information until they have been put into

some kind of order.

• Always it is easier to analyze organized

Data

Table 1: Raw data of cholesterol lowering effect of a drug given to 156 subjects

12 of 18

Ordered Array

  • The preparation of the ordered array is the first step in organizing data.

  • An ordered array is a listing of the values of a collection (either population or sample) from the smallest value to the largest value.

  • The ordered array enables one to determine quickly the value of the

smallest measurement, the value of the largest measurement and the general trends in the data.

Table 2. Ordered array of data reported in Table 1.

13 of 18

According to the previous slide,

Data in Table 1 can be put into frequency distribution table as follows

Construction of Frequency Distribution Table

14 of 18

Frequency Distribution Curve

A plot of frequency versus mid interval size taken as the average of upper and lower limits for each interval. For the data in Table 1 the following Table is plotted

15 of 18

16 of 18

Construction of Frequency Distribution Table

  • How many class intervals to employ?
  • Sturges’s rule:

k = 1 ‏ + 3.322 (log10 n)

K: Number of intervals

n = number of observations

Interval width = Range/K

  • The rule is just used as guidance and should not be applied strictly
  • For data in Table 1
  • K = 1 + 3.322×(log156)

≈ 8

  • Width = (55-(-97))/8

= 19

  • Width of twenty is fair choice
  • Since the lowest and highest values are -97 and 55, the lowest and highest limits could be set as -100 and 40, respectively.

17 of 18

18 of 18