2 of 35

DEFINITION OF STATISTICS

Statistics is the science of collecting , organizing, presenting, analyzing and interpreting numerical data for the purpose of assisting in making an effect decision.

3 of 35

MAIN COMPONENTS INVOLVED IN STATISTICS

Collecting Data
Organizing Data
Presenting
Analysis
Interpreting

4 of 35

PROCEDURES FOR DATA COLLECTION

Experiments: a process that produces a single outcome whose result cannot be predicted with certainty.
Telephone Survey: this involve closed end questions and demographic questions.
Written Questionnaire: its like telephone survey but difference is that it includes open-ended questions.
Mail questionnaire: its similar to written questionnaire but the difference is that it is sent through emails.
Direct observation and personal interview: data collected are physically observed and the data recorded based on what takes place in the process.

5 of 35

ADVANTAGES AND DISADVANTAGES OF DATA COLLECTION METHODS

Experiments
Advantages
provide controls
Pre-planned objectives
Disadvantages
Costly
Time consuming
Requires planning

6 of 35

Cont’n

Telephone surveys
Advantages
Timely
Relatively inexpensive
Disadvantages
Poor reputation
Limited scope

7 of 35

Cont’n

Mail questionnaire and written surveys

Advantages

Inexpensive
Can expand length
Can use open-end questions

Disadvantages

Low response rate
Requires exception clarity

8 of 35

Cont’n

Direct Observation and Personal Interview
Advantages
Expand analysis opportunities
No respondent biases

Disadvantages
Potential observer bias
Costly

9 of 35

Data Collection Issues

Data accuracy
Interviewer bias
Nonresponsive bias
Selection bias
Measurement error
Internal validity
External validity

10 of 35

Population and Samples�

Population: the set of all objects or individuals of interest or the measurement obtained from all objects or individual of interest.
Sample: a sample is a subset of the population.
Parameter: are numbers that summarizes data for an entire population.
Statistics: are numbers that summarizes data from sample.

11 of 35

Sampling Techniques

Sampling techniques are the procedure and method selecting a sample from a population.
Types or forms of Sampling Techniques
Probability or Statistical Sampling Techniques
Non Probability or Non-Statistical Sampling Techniques

12 of 35

PROBABILITY OR STATISTICAL SAMPLING TECHNIQUE

Simple Random Sampling Technique: every possible sample has equal chance of selection.
Stratified Random Sampling Technique: this method divides population into subgroup called Strata according to some common characteristics that reflects the variable (eg :gender, income level, marital status), then randomly selects sample from each subgroup and combine the samples from subgroups and combine the samples from subgroups into forming the sample needed.
Systematic Random Sampling Technique: under this method, one decides on the sample size.
Cluster Sampling Technique: the population is divided into several clusters and each cluster representative of the population

13 of 35

N0N-PROBABILITY OR NON-STATISTICAL SAMPLING TECHNIQUE

Judgmental sampling: selection is based on our judgement.

Convenient sampling: selection is based on the convenience of the interviewer
Quota sampling : selection is based on quotas

Snow-ball sampling: selection is done by respondents leading you to other respondents.
Purposive Sampling: selection is based on the purpose of the survey.

14 of 35

VARIABLES

A variable is defined as an attribute of an object of study.

TYPES OF VARIABLES
Quantitative Variable: consist of two main types
Discrete variable: made up of counting numbers or values eg. Number of patients at the hospital
Continuous variable: made up of non-finite values eg weight of a patient or the temperature of a patient.

15 of 35

Cont’n��

Qualitative or Categorical variable : consist of three main types. They are
Binary variable: variables with two outcomes eg a pregnant mother is expecting either a baby boy or girl.
Nominal variable: multiple outcomes but cannot be ranked. Eg religion, race, gender, marital status, ethnicity.
Ordinal variable: multiple outcomes but can be ranked eg educational background.

16 of 35

EXPERIMENTS

Experiments are usually designed to find out what effect one variable has on the other.
Independent Variable: variable you manipulate in order to affect the outcome of an experiment.eg how different doses of a drug affect the severity of symptoms.
Dependent variable: the variable affected by the manipulated variable.

17 of 35

LEVELS OF MEASUREMENT

A variable has four different levels of measurement:

Nominal
Ordinal
Interval
Ratio

18 of 35

TYPES OF DATA

QUANTITATIVE DATA: Data expressing a certain quantity, amount or range. Eg height of a patient , number of students in a class, weight of new born babies. Can be collected through questionnaires, surveys and experiments. It is numerical in nature
QUALITATIVE DATA: is a categorical measurement expressed not in numbers but rather by means of a natural language. It is non-numerical in nature. Can be collected through observations, one-on –one interview, focus groups etc. eg political affiliation.

19 of 35

OTHER TYPES OF DATA

TIMES SERIES DATA: a set of conservative data values observed at successive points in time. eg. Taking the Blood pressure of a particular group of patient weekly for a year.

CROSS SECTIONAL DATA: a set of data values observed at a fixed point in time. Eg. Temperature of patients in the hospital.

PANEL DATA: a combination of both time series data and cross sectional data. Eg taking the blood pressure of people in the hospital for a year.

20 of 35

DESCRIPTIVE STATISTICS

FREQUENCY DISTRIBUTION: is a summary of set of data that displays the number of observation in each of the distribution distinct categories.
FREQUENCY DISTRIBUTION FOR A RAW DATA
DEVELOPING A FREQUENCY DISTRIBUTION TABLE
list the possible values
Count the number of occurrences at each value.
Add them.

Example 1: construct a frequency distribution table from the raw data below as marks for a quiz taken by level 300 students in UCC.

12, 13, 17, 12,12,12,12,13,14,14,18,15,17,15,15,10, 11,11.

21 of 35

FREQUENCY DISTRIBUTION: GROUPED DATA

STEPS TO BUILD A FREQUENCY DISTRIBUTION FOR GROUP DATA
Sort raw data from low to high
Find the range: (range= maximum value – minimum value)
Select the number of classes ( the rule thumb says classes should be between 5 to 20)
Compute class width: class width = range

number of classes

Determine class boundaries or midpoints
Count the number of value in each class.

22 of 35

FREQUENCY DISTRIBUTION: GROUPED DATA

Example 2

Construct a frequency distribution table for the group data

10,8,12,3,4,15,14,15,24,26,29,28,27,21,22,22,23,34,43,48,46,49,56,57,62,67,1,11,53.

23 of 35

FREQUENCY DISTRIBUTION

Presenting Data
Frequency Histogram

A histogram is a display is statistical information that uses rectangles to show the frequency of data in items in successive numerical intervals of equal size.

CONSTRUCTING FREQUENCY HISTOGRAMS

Construct a frequency distribution
Construct the axes for the histogram
Construct bars with heights corresponding to the frequency of each class
Label the histogram appropriately.

Construct a histogram from example 1 and 2.

24 of 35

RELATIVE FREQUENCY

25 of 35

PRESENTING DATA BY CHARTS

BAR CHARTS

Is a graphical representation of a category data set in which a rectangle is drawn over each class.

Example: drugs sold at ATM pharmacy in January 2012

ATM PHARMACY	JANUARY SALES
Paracetamol	2,000
Vitamin c	3,600
Cold relief	4,500
Metrolex	7,000
Amocyclin	8,600

26 of 35

Constructing Bar Chart

Define the categories for the variable of interest
For each category, determine the appropriate measure
For a column bar chart, locate the categories on the horizontal axis
Interpret the Results

27 of 35

Pie Chart

A pie chart is a graph in the shape of a circle divided into slices corresponding to the categories to be displayed.
Example: disease recorded in Kwesimintsim Poly Clinic in 2018.

Disease	Number recorded
Malaria	5,000
Typhoid	1,000
Fever	2,000
Respiratory infection	3,500
Anemia	4,200

28 of 35

Constructing a Pie Chart

Define the categories for the variable of interest
For each category determine the appropriate measure or value.
Construct the pie chart by displaying one slice for each category.

29 of 35

Measures of Central Tendency

Central tendency is the statistical measure that identifies a single value as representative of an entire distribution.

Importance of Central Tendency Measure
To make comparisons
To condense data
High in further statistical analysis

The three central tendency are

Mean
Median
Mode

30 of 35

Comparing Mode, Median and Mean�

Mode : most frequently occurring value

The mode is appropriate to use when

The observation that is most frequently observed
A quick estimate of central tendency
The data is category

Do not use when;

The data is multi-modal, highly skewed or uniform
When the mean and median are available

31 of 35

Comparing Mode, Median and Mean

Median: is the middle or center value of the data set

The median is appropriate when;

The middle value is desired
The data is skewed
Outliers exist that will affect the mean
One needs to determine whether additional data points fall either above or below the midpoints

Do not use when;

The distribution of the data is symmetrical because the mean is preferred.

32 of 35

Comparing Mode, Median and Mean

The Mean is appropriate to use when;

the data is symmetrical or at least not really skewed.

Do not use when;

The distribution is highly skewed
Outliers exist which will affect the mean more than an acceptable amount.

33 of 35

Computing the Central Tendencies and Location

34 of 35

Computing the Central Tendencies and Location

The Median

Is the center value that divides a data array into two halves.

Computing the Median

Collect the sample data
Sort data from smallest to largest
Calculate the median index
Find the median

Example: find the median for the following data set 11,12,19,9,4,24,4,23,16,5,22,4,14,11,12

35 of 35

Computing the Central Tendencies and Location

Mode

Is the number that is repeated more often than any other.

Example: given the data set 5, 5.5,4.9,4.85,5.25,5.05,4.9, find the mode.

Try:

Find the mean, median and mode for the following list of values: 13,18,13,14,13,16,14,21,13.