1 of 35

INTRODUCTION TO STATISTICS

2 of 35

DEFINITION OF STATISTICS

  • Statistics is the science of collecting , organizing, presenting, analyzing and interpreting numerical data for the purpose of assisting in making an effect decision.

3 of 35

MAIN COMPONENTS INVOLVED IN STATISTICS

  • Collecting Data
  • Organizing Data
  • Presenting
  • Analysis
  • Interpreting

4 of 35

PROCEDURES FOR DATA COLLECTION

  • Experiments: a process that produces a single outcome whose result cannot be predicted with certainty.
  • Telephone Survey: this involve closed end questions and demographic questions.
  • Written Questionnaire: its like telephone survey but difference is that it includes open-ended questions.
  • Mail questionnaire: its similar to written questionnaire but the difference is that it is sent through emails.
  • Direct observation and personal interview: data collected are physically observed and the data recorded based on what takes place in the process.

5 of 35

ADVANTAGES AND DISADVANTAGES OF DATA COLLECTION METHODS

  • Experiments
  • Advantages
  • provide controls
  • Pre-planned objectives
  • Disadvantages
  • Costly
  • Time consuming
  • Requires planning

6 of 35

Cont’n

  • Telephone surveys
  • Advantages
  • Timely
  • Relatively inexpensive
  • Disadvantages
  • Poor reputation
  • Limited scope

7 of 35

Cont’n

  • Mail questionnaire and written surveys

Advantages

  1. Inexpensive
  2. Can expand length
  3. Can use open-end questions

Disadvantages

  1. Low response rate
  2. Requires exception clarity

8 of 35

Cont’n

  • Direct Observation and Personal Interview
  • Advantages
  • Expand analysis opportunities
  • No respondent biases

  • Disadvantages
  • Potential observer bias
  • Costly

9 of 35

Data Collection Issues

  • Data accuracy
  • Interviewer bias
  • Nonresponsive bias
  • Selection bias
  • Measurement error
  • Internal validity
  • External validity

10 of 35

Population and Samples�

  • Population: the set of all objects or individuals of interest or the measurement obtained from all objects or individual of interest.
  • Sample: a sample is a subset of the population.
  • Parameter: are numbers that summarizes data for an entire population.
  • Statistics: are numbers that summarizes data from sample.

11 of 35

Sampling Techniques

  • Sampling techniques are the procedure and method selecting a sample from a population.
  • Types or forms of Sampling Techniques
  • Probability or Statistical Sampling Techniques
  • Non Probability or Non-Statistical Sampling Techniques

12 of 35

PROBABILITY OR STATISTICAL SAMPLING TECHNIQUE

  1. Simple Random Sampling Technique: every possible sample has equal chance of selection.
  2. Stratified Random Sampling Technique: this method divides population into subgroup called Strata according to some common characteristics that reflects the variable (eg :gender, income level, marital status), then randomly selects sample from each subgroup and combine the samples from subgroups and combine the samples from subgroups into forming the sample needed.
  3. Systematic Random Sampling Technique: under this method, one decides on the sample size.
  4. Cluster Sampling Technique: the population is divided into several clusters and each cluster representative of the population

13 of 35

N0N-PROBABILITY OR NON-STATISTICAL SAMPLING TECHNIQUE

  • Judgmental sampling: selection is based on our judgement.

  • Convenient sampling: selection is based on the convenience of the interviewer
  • Quota sampling : selection is based on quotas

  • Snow-ball sampling: selection is done by respondents leading you to other respondents.
  • Purposive Sampling: selection is based on the purpose of the survey.

14 of 35

VARIABLES

  • A variable is defined as an attribute of an object of study.

  • TYPES OF VARIABLES
  • Quantitative Variable: consist of two main types
  • Discrete variable: made up of counting numbers or values eg. Number of patients at the hospital
  • Continuous variable: made up of non-finite values eg weight of a patient or the temperature of a patient.

15 of 35

Cont’n���

  • Qualitative or Categorical variable : consist of three main types. They are
  • Binary variable: variables with two outcomes eg a pregnant mother is expecting either a baby boy or girl.
  • Nominal variable: multiple outcomes but cannot be ranked. Eg religion, race, gender, marital status, ethnicity.
  • Ordinal variable: multiple outcomes but can be ranked eg educational background.

16 of 35

EXPERIMENTS

  • Experiments are usually designed to find out what effect one variable has on the other.
  • Independent Variable: variable you manipulate in order to affect the outcome of an experiment.eg how different doses of a drug affect the severity of symptoms.
  • Dependent variable: the variable affected by the manipulated variable.

17 of 35

LEVELS OF MEASUREMENT

  • A variable has four different levels of measurement:

  • Nominal
  • Ordinal
  • Interval
  • Ratio

18 of 35

TYPES OF DATA

  • QUANTITATIVE DATA: Data expressing a certain quantity, amount or range. Eg height of a patient , number of students in a class, weight of new born babies. Can be collected through questionnaires, surveys and experiments. It is numerical in nature
  • QUALITATIVE DATA: is a categorical measurement expressed not in numbers but rather by means of a natural language. It is non-numerical in nature. Can be collected through observations, one-on –one interview, focus groups etc. eg political affiliation.

19 of 35

OTHER TYPES OF DATA

  • TIMES SERIES DATA: a set of conservative data values observed at successive points in time. eg. Taking the Blood pressure of a particular group of patient weekly for a year.

  • CROSS SECTIONAL DATA: a set of data values observed at a fixed point in time. Eg. Temperature of patients in the hospital.

  • PANEL DATA: a combination of both time series data and cross sectional data. Eg taking the blood pressure of people in the hospital for a year.

20 of 35

DESCRIPTIVE STATISTICS

  • FREQUENCY DISTRIBUTION: is a summary of set of data that displays the number of observation in each of the distribution distinct categories.
  • FREQUENCY DISTRIBUTION FOR A RAW DATA
  • DEVELOPING A FREQUENCY DISTRIBUTION TABLE
  • list the possible values
  • Count the number of occurrences at each value.
  • Add them.

Example 1: construct a frequency distribution table from the raw data below as marks for a quiz taken by level 300 students in UCC.

12, 13, 17, 12,12,12,12,13,14,14,18,15,17,15,15,10, 11,11.

21 of 35

FREQUENCY DISTRIBUTION: GROUPED DATA

  • STEPS TO BUILD A FREQUENCY DISTRIBUTION FOR GROUP DATA
  • Sort raw data from low to high
  • Find the range: (range= maximum value – minimum value)
  • Select the number of classes ( the rule thumb says classes should be between 5 to 20)
  • Compute class width: class width = range

number of classes

  1. Determine class boundaries or midpoints
  2. Count the number of value in each class.

22 of 35

FREQUENCY DISTRIBUTION: GROUPED DATA

  • Example 2

Construct a frequency distribution table for the group data

10,8,12,3,4,15,14,15,24,26,29,28,27,21,22,22,23,34,43,48,46,49,56,57,62,67,1,11,53.

23 of 35

FREQUENCY DISTRIBUTION

  • Presenting Data
  • Frequency Histogram

A histogram is a display is statistical information that uses rectangles to show the frequency of data in items in successive numerical intervals of equal size.

CONSTRUCTING FREQUENCY HISTOGRAMS

  1. Construct a frequency distribution
  2. Construct the axes for the histogram
  3. Construct bars with heights corresponding to the frequency of each class
  4. Label the histogram appropriately.

Construct a histogram from example 1 and 2.

24 of 35

RELATIVE FREQUENCY

  •  

25 of 35

PRESENTING DATA BY CHARTS

  • BAR CHARTS

Is a graphical representation of a category data set in which a rectangle is drawn over each class.

Example: drugs sold at ATM pharmacy in January 2012

ATM PHARMACY

JANUARY SALES

Paracetamol

2,000

Vitamin c

3,600

Cold relief

4,500

Metrolex

7,000

Amocyclin

8,600

26 of 35

Constructing Bar Chart

  • Define the categories for the variable of interest
  • For each category, determine the appropriate measure
  • For a column bar chart, locate the categories on the horizontal axis
  • Interpret the Results

27 of 35

Pie Chart

  • A pie chart is a graph in the shape of a circle divided into slices corresponding to the categories to be displayed.
  • Example: disease recorded in Kwesimintsim Poly Clinic in 2018.

Disease

Number recorded

Malaria

5,000

Typhoid

1,000

Fever

2,000

Respiratory infection

3,500

Anemia

4,200

28 of 35

Constructing a Pie Chart

  • Define the categories for the variable of interest
  • For each category determine the appropriate measure or value.
  • Construct the pie chart by displaying one slice for each category.

29 of 35

Measures of Central Tendency

  • Central tendency is the statistical measure that identifies a single value as representative of an entire distribution.

  • Importance of Central Tendency Measure
  • To make comparisons
  • To condense data
  • High in further statistical analysis

The three central tendency are

  1. Mean
  2. Median
  3. Mode

30 of 35

Comparing Mode, Median and Mean�

  • Mode : most frequently occurring value

The mode is appropriate to use when

  1. The observation that is most frequently observed
  2. A quick estimate of central tendency
  3. The data is category

Do not use when;

  1. The data is multi-modal, highly skewed or uniform
  2. When the mean and median are available

31 of 35

Comparing Mode, Median and Mean

  • Median: is the middle or center value of the data set

The median is appropriate when;

  1. The middle value is desired
  2. The data is skewed
  3. Outliers exist that will affect the mean
  4. One needs to determine whether additional data points fall either above or below the midpoints

Do not use when;

The distribution of the data is symmetrical because the mean is preferred.

32 of 35

Comparing Mode, Median and Mean

  • The Mean is appropriate to use when;

the data is symmetrical or at least not really skewed.

Do not use when;

  1. The distribution is highly skewed
  2. Outliers exist which will affect the mean more than an acceptable amount.

33 of 35

Computing the Central Tendencies and Location

  •  

34 of 35

Computing the Central Tendencies and Location

  • The Median

Is the center value that divides a data array into two halves.

Computing the Median

  1. Collect the sample data
  2. Sort data from smallest to largest
  3. Calculate the median index
  4. Find the median

Example: find the median for the following data set 11,12,19,9,4,24,4,23,16,5,22,4,14,11,12

35 of 35

Computing the Central Tendencies and Location

  • Mode

Is the number that is repeated more often than any other.

Example: given the data set 5, 5.5,4.9,4.85,5.25,5.05,4.9, find the mode.

Try:

Find the mean, median and mode for the following list of values: 13,18,13,14,13,16,14,21,13.