1 of 92

Chapter 15��Statistical Analysis of Quantitative Data���Dr Sanaa Abujilban, PhD

Copyright © 2010 Wolters Kluwer Health | Lippincott Williams & Wilkins

2 of 92

Statistical Analysis

  • Descriptive statistics
    • Used to describe and synthesize data
  • Inferential statistics
    • Used to make inferences about the population based on sample data

Copyright © 2010 Wolters Kluwer Health | Lippincott Williams & Wilkins

3 of 92

Descriptive Indexes

  • Parameter
    • A descriptor for a population (e.g., the average age of menses for Canadian females)
  • Statistic
    • A descriptor for a sample (e.g., the average age of menses for female students at McGill University)

Copyright © 2010 Wolters Kluwer Health | Lippincott Williams & Wilkins

4 of 92

Phases of Data Analysis

  1. Pre-analysis phases: coding and data entry.
  2. Preliminary assessment and actions:
  3. Assessing and handling missing data,
  4. Assessing quantitative data quality,
  5. Assessing bias,
  6. Testing assumptions, and
  7. Performing additional analysis such as reliability and count.

Copyright © 2010 Wolters Kluwer Health | Lippincott Williams & Wilkins

5 of 92

Phases of Data Analysis:

  1. Data and substantive analysis: the researcher must develop a substantive plan to reduce the temptation of going wrong during analysis of the data.
  2. Interpretation of results: accuracy of data and its meaning.

5. Representation of data through graphs and tables

  • Now it is done through statistical packages such as SPSS.

Copyright © 2010 Wolters Kluwer Health | Lippincott Williams & Wilkins

6 of 92

Statistics

  • Descriptive

1. Univaraite frequency distribution.

2. Measures of central tendency (mode, median, mean), and

measures of variability (SD, variance (the degree that observations are dispersed around the central value), range).

3. Bivaraite analysis (contingency table, correlation- such as

Pearson or Product Moment Correlation Coefficient, and

Spearmans correlation coefficient).

Copyright © 2010 Wolters Kluwer Health | Lippincott Williams & Wilkins

7 of 92

Descriptive Statistics: Frequency Distributions

  • A systematic arrangement of numeric values on a variable from lowest to highest, and a count of the number of times (and/or percentage) each value was obtained
  • Frequency distributions can be described in terms of:
    • Shape
    • Central tendency
    • Variability
  • Can be presented in a table (Ns and percentages) or graphically (e.g., frequency polygons)

Copyright © 2010 Wolters Kluwer Health | Lippincott Williams & Wilkins

8 of 92

Frequency Polygon

Copyright © 2010 Wolters Kluwer Health | Lippincott Williams & Wilkins

9 of 92

Shapes of Distributions

  • aspect of a distribution’s shape
  • Symmetry
  • Symmetric: if, when folded over, the two halves are superimposed on one another; With real data sets, distributions are rarely perfectly symmetric
  • Skewed (asymmetric): the peak is off center and one tail is longer than the other.
      • Positive skew (long tail points to the right; e.g. Personal income)
      • Negative skew (long tail points to the left; e.g. age at death)

Copyright © 2010 Wolters Kluwer Health | Lippincott Williams & Wilkins

10 of 92

All the distributions in Figure are

symmetric.

Copyright © 2010 Wolters Kluwer Health | Lippincott Williams & Wilkins

11 of 92

Copyright © 2010 Wolters Kluwer Health | Lippincott Williams & Wilkins

12 of 92

Shapes of Distributions (cont.)�aspect of a distribution’s shape

  1. Peakedness (how sharp the peak is)
  2. Modality (number of peaks)
    • Unimodal (1 peak)
    • Bimodal (2 peaks)
    • Multimodal (2+ peaks)

Copyright © 2010 Wolters Kluwer Health | Lippincott Williams & Wilkins

13 of 92

Normal Distribution; (sometimes�called a bell-shaped curve)

  • Characteristics:
    • Symmetric
    • Unimodal
    • Not too peaked, not too flat
  • Important distribution in inferential statistics

Copyright © 2010 Wolters Kluwer Health | Lippincott Williams & Wilkins

14 of 92

Question

Is the following statement True or False?

  • A bell-shaped curve is also called a normal distribution.�

Copyright © 2010 Wolters Kluwer Health | Lippincott Williams & Wilkins

15 of 92

Answer

  • True
    • A normal distribution is commonly referred to as a bell-shaped curve.

Copyright © 2010 Wolters Kluwer Health | Lippincott Williams & Wilkins

16 of 92

Central Tendency

  • Index of “typicalness” of a set of scores that comes from center of the distribution
  • Mode—the most frequently occurring score in a distribution
    • Ex: 2, 3, 3, 3, 4, 5, 6, 7, 8, 9 Mode = 3
  • Median—the point in a distribution above which and below which 50% of cases fall
    • Ex: 2, 3, 3, 3, 4 | 5, 6, 7, 8, 9 Median = 4.5
  • Mean—equals the sum of all scores divided by the total number of scores
    • Ex: 2, 3, 3, 3, 4, 5, 6, 7, 8, 9 Mean = 5.0

Copyright © 2010 Wolters Kluwer Health | Lippincott Williams & Wilkins

17 of 92

Comparison of Measures of Central Tendency

  • Mode, useful mainly as gross descriptor, especially of nominal measures
  • Median, useful mainly as descriptor of typical value when distribution is skewed (e.g., household income)
  • Mean, most stable and widely used indicator of central tendency

Copyright © 2010 Wolters Kluwer Health | Lippincott Williams & Wilkins

18 of 92

Distributions (cont.)

Category Percent

Under 35 9%

36-45 21

46-55 45

56-65 19

66+ 6

A Frequency Distribution Table

Copyright © 2010 Wolters Kluwer Health | Lippincott Williams & Wilkins

19 of 92

Relationships of central tendency indexes in skewed distributions.

Relationships of central tendency indexes in skewed distributions.

Copyright © 2010 Wolters Kluwer Health | Lippincott Williams & Wilkins

20 of 92

Variability

The degree to which scores in a distribution are spread out or dispersed

  • Homogeneity—little variability (school B)
  • Heterogeneity—great variability (School A)

Copyright © 2010 Wolters Kluwer Health | Lippincott Williams & Wilkins

21 of 92

Two distributions of different variability

School A has a wide range of scores

there are few students at either extreme

Copyright © 2010 Wolters Kluwer Health | Lippincott Williams & Wilkins

22 of 92

Indexes of Variability

That express the extent to which scores deviate from one another

  • Range: highest value minus lowest value
  • the range for school A is approximately 500 (750 250), and the range for school B is approximately 300 (650 350).
  • Standard deviation (SD): indicates the average amount of deviation of values from the mean
  • the first step in calculating a standard deviation is to compute deviation scores for each subject. A deviation score (symbolized as x) is the difference between an individual score and the mean. If a person weighed 150 pounds and the sample mean were 140, then the person’s deviation score would be 10.

Copyright © 2010 Wolters Kluwer Health | Lippincott Williams & Wilkins

23 of 92

Standard deviations in a normal distribution

  • There are about 3 SDs above and 3 SDs below the mean in a normal distribution.
  • normally distributed scores with a mean of 50 and an SD of 10
  • In a normal distribution, a fixed percentage of cases falls within certain distances from the mean. Sixty-eight percent of all cases fall within 1 SD of the mean (34% above and 34% below the mean). In this example, nearly 7 of every 10 scores fall between 40 and 60. Ninety-five percent of the scores in a normal distribution fall within 2 SDs from the mean.

Copyright © 2010 Wolters Kluwer Health | Lippincott Williams & Wilkins

24 of 92

Standard deviations in a normal distribution.

Copyright © 2010 Wolters Kluwer Health | Lippincott Williams & Wilkins

25 of 92

Bivariate Descriptive Statistics

  • Used for describing the relationship between two variables
  • Two common approaches:
    1. Contingency tables (Crosstabs)
    2. Correlation coefficients

Copyright © 2010 Wolters Kluwer Health | Lippincott Williams & Wilkins

26 of 92

Contingency Table

  • A two-dimensional frequency distribution; frequencies of two variables are cross-tabulated
  • “Cells” at intersection of rows and columns display counts and percentages
  • Variables usually nominal or ordinal

Copyright © 2010 Wolters Kluwer Health | Lippincott Williams & Wilkins

27 of 92

Copyright © 2010 Wolters Kluwer Health | Lippincott Williams & Wilkins

28 of 92

Correlation Coefficients

  • Indicate direction and magnitude of relationship between two variables
  • The most widely used correlation coefficient is Pearson’s r.
  • Pearson’s r is used when both variables are interval- or ratio-level measures.

Copyright © 2010 Wolters Kluwer Health | Lippincott Williams & Wilkins

29 of 92

Question

The researcher subtracts the lowest value of data from the highest value of data to obtain which of the following?

  1. Mode
  2. Median
  3. Mean
  4. Range

Copyright © 2010 Wolters Kluwer Health | Lippincott Williams & Wilkins

30 of 92

Answer

d. Range

  • The range is calculated by subtracting the lowest value of data from the highest value of data. The mode refers to the most frequently occurring score. The median refers to the point distribution above which and below which 50% of the cases fall. The mean is the sum of all the scores divided by the total number of scores.

Copyright © 2010 Wolters Kluwer Health | Lippincott Williams & Wilkins

31 of 92

Correlation Coefficients (cont.)

  • Correlation coefficients can range from

-1.00 to +1.00

    • Negative relationship (0.00 to -1.00) —one variable increases in value as the other decreases, e.g., amount of exercise and weight
    • Positive relationship (0.00 to +1.00) —both variables increase, e.g., calorie consumption and weight
    • A correlation coefficient of 0 indicates zero correlation;

Copyright © 2010 Wolters Kluwer Health | Lippincott Williams & Wilkins

32 of 92

Correlation Coefficients (cont.)

  • The greater the absolute value of the coefficient, the stronger the relationship:

Ex: r = -.45 is stronger than r = +.40

  • With multiple variables, a correlation matrix can be displayed to show all pairs of correlations.

Copyright © 2010 Wolters Kluwer Health | Lippincott Williams & Wilkins

33 of 92

Bivaraite Analysis : Correlation

  • A measure of bivariate descriptive statistics

  • Used to answer the question To what extent are the variables related?

Copyright © 2010 Wolters Kluwer Health | Lippincott Williams & Wilkins

34 of 92

Bivaraite Analysis: Correlation

  • Spearmans Rank Correlation A method used for calculating CORRELATION A method used for calculating CORRELATION between VARIABLES, when the data does not follow the NORMAL DISTRIBUTION. This is therefore a non-parametric test.

Copyright © 2010 Wolters Kluwer Health | Lippincott Williams & Wilkins

35 of 92

Bivaraite Analysis �2. Correlation

  • Pearsons r - A method used for calculating CORRELATION - A method used for calculating CORRELATION between VARIABLES, when the data is believed to follow the NORMAL DISTRIBUTION. This is therefore a parametric test.

Copyright © 2010 Wolters Kluwer Health | Lippincott Williams & Wilkins

36 of 92

Index of Correlation

  • Below .3= low correlation

  • .3-.7= medium correlation

  • Above .7= high correlation

Copyright © 2010 Wolters Kluwer Health | Lippincott Williams & Wilkins

37 of 92

scatter plot

  • Graphically, the relationship between two variables is displayed on a scatter plot or scatter diagram.

Copyright © 2010 Wolters Kluwer Health | Lippincott Williams & Wilkins

38 of 92

Figure 1 - top left - is an example of �a positive correlation.�

Copyright © 2010 Wolters Kluwer Health | Lippincott Williams & Wilkins

39 of 92

Figure 2 is an example of a zero correlation.

Copyright © 2010 Wolters Kluwer Health | Lippincott Williams & Wilkins

40 of 92

Figure 3 is an example of �a negative correlation.

Copyright © 2010 Wolters Kluwer Health | Lippincott Williams & Wilkins

41 of 92

Describing Risk

  • Clinical decision-making for EBP may involve the calculation of risk indexes, so that decisions can be made about relative risks for alternative treatments or exposures.

Copyright © 2010 Wolters Kluwer Health | Lippincott Williams & Wilkins

42 of 92

Describing Risk (cont.)

  • Some frequently used indexes:
    • Absolute Risk (AR): the proportion of people who experienced an undesirable outcome in each group.
    • Absolute Risk Reduction (ARR): represents a comparison of the two risks (exposed and unexposed groups)
      • ARR = AR for the unexposed group – AR for the exposed group
    • Odds Ratio (OR)

Copyright © 2010 Wolters Kluwer Health | Lippincott Williams & Wilkins

43 of 92

The Odds Ratio (OR)

  • The odds = the proportion of people with an adverse outcome relative to those without it
    • e.g., the odds of ….
  • The odds ratio is computed to compare the odds of an adverse outcome for two groups being compared (e.g., men vs. women, experimentals vs. controls).

Copyright © 2010 Wolters Kluwer Health | Lippincott Williams & Wilkins

44 of 92

Inferential Statistics

  • Used to make objective decisions about population parameters using sample data
  • Based on laws of probability
  • Uses the concept of theoretical distributions
    • e.g., the sampling distribution of the mean

Copyright © 2010 Wolters Kluwer Health | Lippincott Williams & Wilkins

45 of 92

Sampling Distribution of the Mean

  • A theoretical distribution of means for an infinite (endless) number of samples drawn from the same population
  • Is always normally distributed
  • Its mean equals the population mean.
  • Its standard deviation is called the standard error of the mean (SEM).
  • SEM is estimated from a sample SD and the sample size.

Copyright © 2010 Wolters Kluwer Health | Lippincott Williams & Wilkins

46 of 92

Statistical Inference—Two Forms

  • Parameter estimation
  • Hypothesis testing (more common among nurse researchers than among medical researchers)

Copyright © 2010 Wolters Kluwer Health | Lippincott Williams & Wilkins

47 of 92

Question

Is the following statement True or False?

  • A correlation coefficient of -38 is stronger than a correlation coefficient of +32.

Copyright © 2010 Wolters Kluwer Health | Lippincott Williams & Wilkins

48 of 92

Answer

  • True
    • For a correlation coefficient, the greater the absolute value of the coefficient, the stronger the relationship. So the absolute value of -.38 is greater than the absolute value of +.32 and thus is stronger.

Copyright © 2010 Wolters Kluwer Health | Lippincott Williams & Wilkins

49 of 92

Estimation of Parameters

  • Point estimation—A single descriptive statistic that estimates the population value (e.g., a mean, percentage, or OR)
  • Interval estimation—A range of values within which a population value probably lies
    • Involves computing a confidence interval (CI)

Copyright © 2010 Wolters Kluwer Health | Lippincott Williams & Wilkins

50 of 92

Confidence Intervals

  • CIs indicate the upper and lower confidence limits and the probability that the population value is between those limits.
    • For example, a 95% CI of 40–50 for a sample mean of 45 indicates there is a 95% probability that the population mean is between 40 and 50.

Copyright © 2010 Wolters Kluwer Health | Lippincott Williams & Wilkins

51 of 92

Inferential Statistics �Hypothesis Testing

  • The most commonly used purpose of inferential statistics is hypothesis testing.
  • Scientific hypothesis which the researcher believes will be the outcome of the study.
  • Null hypothesis the hypothesis that actually can be tested by statistical methods states that there is no difference between the groups or there is no actual relationship between the variables.

Copyright © 2010 Wolters Kluwer Health | Lippincott Williams & Wilkins

52 of 92

Hypothesis Testing

  • Based on rules of negative inference: research hypotheses are supported if null hypotheses can be rejected.
  • Involves statistical decision-making to either:
    1. accept the null hypothesis or
    2. reject the null hypothesis

Copyright © 2010 Wolters Kluwer Health | Lippincott Williams & Wilkins

53 of 92

Hypothesis Testing

  • Researchers compute a test statistic with their data (using appropriate formulas ) and then determine whether the statistic falls beyond the critical region in the relevant theoretical distribution.
    • Values beyond the critical region indicate that the null hypothesis is improbable (impossible), at a specified probability level.

Copyright © 2010 Wolters Kluwer Health | Lippincott Williams & Wilkins

54 of 92

critical region

Copyright © 2010 Wolters Kluwer Health | Lippincott Williams & Wilkins

55 of 92

Hypothesis Testing (cont.)

  • If the value of the test statistic indicates that the null hypothesis is improbable, then the result is statistically significant.
  • A nonsignificant result means that any observed difference or relationship could have happened by chance.
  • Statistical decisions are either correct or incorrect.

Copyright © 2010 Wolters Kluwer Health | Lippincott Williams & Wilkins

56 of 92

If the value of the test statistic indicates that the null hypothesis is improbable, then the result is statistically significant

Copyright © 2010 Wolters Kluwer Health | Lippincott Williams & Wilkins

57 of 92

critical region and statistical tests

  • critical region: The decision rule is to reject the null hypothesis if the test statistic falls at or beyond a critical region on the applicable theoretical distribution, and to accept the null hypothesis otherwise.
  • For every test statistic, there is a related theoretical distribution. Researchers compare the value of the computed test statistic to values in a table that specify critical limits for the applicable distribution.

Copyright © 2010 Wolters Kluwer Health | Lippincott Williams & Wilkins

58 of 92

Errors in Statistical Decisions

  • Type I error: rejection of a null hypothesis when it should not be rejected; a false-positive result
    • Risk of error is controlled by the level of significance (alpha), e.g., α = .05 or .01.
  • Type II error: failure to reject a null hypothesis when it should be rejected; a false-negative result
    • The risk of this error is beta (β).
    • Power is the ability of a test to detect true relationships; power = 1 – β.
    • By convention, power should be at least .80.
    • Larger samples = greater power

Copyright © 2010 Wolters Kluwer Health | Lippincott Williams & Wilkins

59 of 92

Copyright © 2010 Wolters Kluwer Health | Lippincott Williams & Wilkins

60 of 92

Parametric and Nonparametric Tests

  • Parametric Statistics:
    • Use involves estimation of a parameter; assumes variables are normally distributed in the population; measurements are on interval/ratio scale
  • Nonparametric Statistics:
    • Use does not involve estimation of a parameter; measurements typically on nominal or ordinal scale; doesn’t assume normal distribution in the population

Copyright © 2010 Wolters Kluwer Health | Lippincott Williams & Wilkins

61 of 92

Overview of Hypothesis-Testing Procedures

  • Select an appropriate test statistic.
  • Establish significance criterion (e.g., α = .05).
  • Compute test statistic with actual data.
  • Calculate degrees of freedom (df) (number of observation free to vary about a parameter) for the test statistic.
  • Obtain a critical value for the statistical test (e.g., from a table).
  • Compare the computed test statistic to the tabled value.
  • Make decision to accept or reject null hypothesis.

Copyright © 2010 Wolters Kluwer Health | Lippincott Williams & Wilkins

62 of 92

Inferential Statistics

  • Purpose: Enable the researcher to generalize about the population based on data obtained from the sample.
  • Level of significance which is typically .01 or .05.0.01 mean that only one case in 100 cases may yield erroneous findings.
  • Minimum level in nursing research is p = 0.05; this means that if the study were done 100 times, the decision to reject the null hypothesis would be wrong 5 times out of those 100 trials.

Copyright © 2010 Wolters Kluwer Health | Lippincott Williams & Wilkins

63 of 92

Inferential Statistics

The most common used inferential statistics are:

  1. t-test,
  2. ANOVA,
  3. Chi-square,
  4. Mann-Whitney U,
  5. correlation coefficient,
  6. simple linear regression,
  7. multiple regression,
  8. ANCOVA,
  9. MANOVA, and
  10. path analysis.

Copyright © 2010 Wolters Kluwer Health | Lippincott Williams & Wilkins

64 of 92

Commonly Used Bivariate Statistical Tests

  • t-Test
  • Analysis of variance (ANOVA)
  • Pearson’s r
  • Chi-squared test

Copyright © 2010 Wolters Kluwer Health | Lippincott Williams & Wilkins

65 of 92

Question

Is the following statement True or False?

  • Parametric statistical testing usually involves measurements on the nominal scale.

Copyright © 2010 Wolters Kluwer Health | Lippincott Williams & Wilkins

66 of 92

Answer

  • False
    • Parametric statistics usually involve measurements that are on the interval or ratio scale; nonparametric tests are used for nominal or ordinal data.

Copyright © 2010 Wolters Kluwer Health | Lippincott Williams & Wilkins

67 of 92

t -Test

  • Tests the difference between two means
  • t-test for independent groups: between-subjects test

e.g., means for men vs. women

  1. t-test for dependent (paired) groups: within-subjects test

e.g., means for patients before and after surgery

Copyright © 2010 Wolters Kluwer Health | Lippincott Williams & Wilkins

68 of 92

Analysis of Variance (ANOVA)

Tests the difference between more than 2 means (3 or more groups)

    • One-way ANOVA (e.g., 3 groups): is used to test the effect of one independent variable (e.g., different interventions) on a dependent variable.
    • Multifactor (e.g., two-way) ANOVA: multiple independent variables
    • Repeated measures ANOVA (RM-ANOVA): within subjects: means at different points of time; is used when there are three or more measures of the same dependent variable for each subject.

Copyright © 2010 Wolters Kluwer Health | Lippincott Williams & Wilkins

69 of 92

Chi-Squared Test

  • Tests the difference in proportions in categories within a contingency table
  • Compares observed frequencies in each cell with expected frequencies—the frequencies expected if there was no relationship

Copyright © 2010 Wolters Kluwer Health | Lippincott Williams & Wilkins

70 of 92

Correlation

  • Pearson’s r is both a descriptive and an inferential statistic.
  • Tests that the relationship between two variables is not zero.

Copyright © 2010 Wolters Kluwer Health | Lippincott Williams & Wilkins

71 of 92

Effect Size

  • Effect size is an important concept in power analysis.
  • Effect size indexes summarize the magnitude (size) of the effect of the independent variable on the dependent variable.
  • In a comparison of two group means (i.e., in a t-test situation), the effect size index is d.
  • By convention:

d ≤ .20, small effect

d = .50, moderate effect

d ≥ .80, large effect

Copyright © 2010 Wolters Kluwer Health | Lippincott Williams & Wilkins

72 of 92

Multivariate Statistics

  • Statistical procedures for analyzing relationships among 3 or more variables
  • Two commonly used procedures in nursing research:
    • Multiple regression: is used to make predictions about phenomena.
    • Analysis of covariance (ANCOVA)

Copyright © 2010 Wolters Kluwer Health | Lippincott Williams & Wilkins

73 of 92

Multiple Linear Regression

  • Used to predict a dependent variable based on two or more independent (predictor) variables
  • Dependent variable is continuous (interval or ratio-level data).
  • Predictor variables are continuous (interval or ratio) or dichotomous.

Copyright © 2010 Wolters Kluwer Health | Lippincott Williams & Wilkins

74 of 92

Multiple Correlation Coefficient (R )

  • The correlation index for one dependent variable and 2 or more independent (predictor) variables: R
  • Does not have negative values ( from 0 to 1): shows strength of relationships, not direction
  • R2 is an estimate of the proportion of variability in the dependent variable accounted for by all predictors.

Copyright © 2010 Wolters Kluwer Health | Lippincott Williams & Wilkins

75 of 92

Analysis of Covariance (ANCOVA)

  • Extends ANOVA by removing the effect of confounding variables (covariates) before testing whether mean group differences are statistically significant
  • Levels of measurement of variables:
    • Dependent variable is continuous—ratio or interval level
    • Independent variable is nominal (group status)
    • Covariates are continuous or dichotomous

Copyright © 2010 Wolters Kluwer Health | Lippincott Williams & Wilkins

76 of 92

Visual respresentation of analysis of covariance

Copyright © 2010 Wolters Kluwer Health | Lippincott Williams & Wilkins

77 of 92

Question

Which test would be used to compare the observed frequencies with expected frequencies within a contingency table?

  1. Pearson’s r
  2. Chi-squared test
  3. t-Test
  4. ANOVA

Copyright © 2010 Wolters Kluwer Health | Lippincott Williams & Wilkins

78 of 92

Answer

b. Chi-squared test

  • The chi-squared test evaluates the difference in proportions in categories within a contingency table, comparing the observed frequencies with the expected frequencies. Pearson’s r tests that the relationship between two variables is not zero. The t-test evaluates the difference between two means. The ANOVA tests the difference between more than 2 means.

Copyright © 2010 Wolters Kluwer Health | Lippincott Williams & Wilkins

79 of 92

Logistic Regression

  • Analyzes relationships between a nominal-level dependent variable and 2 or more independent variables
  • Yields an odds ratio—the risk of an outcome occurring given one condition, versus the risk of it occurring given a different condition
  • The OR is calculated after first removing (statistically controlling) the effects of confounding variables.

Copyright © 2010 Wolters Kluwer Health | Lippincott Williams & Wilkins

80 of 92

Factor Analysis

  • Used to reduce a large set of variables into a smaller set of underlying dimensions (factors)
  • Used primarily in developing scales and complex instruments

Copyright © 2010 Wolters Kluwer Health | Lippincott Williams & Wilkins

81 of 92

Multivariate Analysis of Variance�

  • The extension of ANOVA to more than one dependent variable
  • Abbreviated as MANOVA
  • Can be used with covariates: Multivariate analysis of covariance (MANCOVA)

Copyright © 2010 Wolters Kluwer Health | Lippincott Williams & Wilkins

82 of 92

Causal Modeling

  • Tests a hypothesized multivariable causal explanation of a phenomenon
  • Includes:
    • Path analysis
    • Structural equations modeling

Copyright © 2010 Wolters Kluwer Health | Lippincott Williams & Wilkins

83 of 92

Univariate Tests

Level of Measurement

Test

Categorical

Chi square test

Non-categorical

One-sample t test

Copyright © 2010 Wolters Kluwer Health | Lippincott Williams & Wilkins

84 of 92

Bivariate Tests

VARIABLE 1

(IDV)

VARIABLE 2

(DV)

STATISTICAL TEST

Categorical

Categorical

Chi square test for crosstables

Categorical

(2 groups or values)

Non-categorical

Independent samples

t-test

Categorical

(> 2 groups or values)

Non-categorical

ANOVA

Non-Categorical

Non-Categorical

Correlation/ Regression

Copyright © 2010 Wolters Kluwer Health | Lippincott Williams & Wilkins

85 of 92

PREDICTOR VARIABLE (S)

OUTCOME VARIABLE

Categorical

Continuous

Categorical

Chi-square, Log linear, Logistic

T-test, ANOVA (Analysis of Varirance), Linear regression

Continuous

Logistic regression

Linear regression,  Pearson correlation

Mixture of Categorical and Continuous

Logistic regression

Linear regression, Analysis of Covariance

Copyright © 2010 Wolters Kluwer Health | Lippincott Williams & Wilkins

86 of 92

Summary of Tests

  • Chi-Square : is used to look at the statistical significance of an association between a categorical outcome and a categorical determining variable. 
  • T-Test: looks at the difference in means of a continuous variable between two groups when the determining variable is categorical .
  • ANOVA (Analysis of Variance) : is used to see an association between a continuous outcome variable and a categorical determining variable. Looks at the difference in means of a continuous variable between more than two groups.

- One -way ANOVA- is used for a quantitative dependent variable by a single factor (independent) variable.

- Two-way ANOVA- is used for one dependent variable by one or more factors and/or variables.

Copyright © 2010 Wolters Kluwer Health | Lippincott Williams & Wilkins

87 of 92

Summary of Tests

  • ANCOVA (Feature of Multiple Reg. and ANOVA); use to compare means of two or more groups with a statistical control of extraneous variables (covariates).
  • MANOVA: An extension of ANOVA; compare 2 ore more groups that have more than one dependent variable simultaneously.
  • Linear Regression: estimates the coefficients of the linear equation, involving one or more independent variables, that best predict the value of the dependent variable.
  • Correlations: Correlation testing usually runs a continuous outcome against a continuous determining variable to see if they have a linear relationship (positive, negative or none).  This type of test will not tell the strength of the relationship between the variables, but will indicate the existence of the relationship.  The Bivariate Correlations procedure computes Pearsons correlation coefficient, Spearmans rho and Kendalls tau-b with their significance levels.  

Copyright © 2010 Wolters Kluwer Health | Lippincott Williams & Wilkins

88 of 92

Guidelines for Critiquing Quantitative Analyses

1. Does the report include any descriptive statistics? Do these statistics sufficiently describe the major characteristics of the researcher’s data set?

2. Were indices of both central tendency and variability provided in the report? If not, how does the absence of this information affect the reader’s understanding of the research variables?

3. Were the correct descriptive statistics used (e.g., was a median used when a mean would have been more appropriate)?

Copyright © 2010 Wolters Kluwer Health | Lippincott Williams & Wilkins

89 of 92

Guidelines for Critiquing Quantitative Analyses

4. Does the report include any inferential statistics? Was a statistical test performed for each of the hypotheses or research questions? If inferential statistics were not used, should they have been?

5. Was the selected statistical test appropriate, given the level of measurement of the variables?

6. Was a parametric test used? Does it appear that the assumptions for the use of parametric tests were met? If a nonparametric test was used, should a more powerful parametric procedure have been used instead?

Copyright © 2010 Wolters Kluwer Health | Lippincott Williams & Wilkins

90 of 92

Guidelines for Critiquing Quantitative Analyses

7. Were any multivariate procedures used? If so, does it appear that the researcher chose the appropriate test? If multivariate procedures were not used, should they have been? Would the use of a multivariate procedure have improved the researcher’s ability to draw conclusions about the relationship between the dependent and independent variables?

8. In general, does the report provide a rationale for the use of the selected statistical tests? Does the report contain sufficient information for you to judge whether appropriate statistics were used?

Copyright © 2010 Wolters Kluwer Health | Lippincott Williams & Wilkins

91 of 92

Guidelines for Critiquing Quantitative Analyses

9. Was there an appropriate amount of statistical information reported? Are the findings clearly and logically organized?

10. Were the results of any statistical tests significant? What do the tests tell you about the plausibility of the research hypotheses?

11. Were tables used judiciously to summarize large masses of statistical information? Are the tables clearly presented, with good titles and carefully labeled column headings? Is the information presented in the text consistent with the information presented in the tables? Is the information totally redundant?

Copyright © 2010 Wolters Kluwer Health | Lippincott Williams & Wilkins

92 of 92

End of Presentation

Copyright © 2010 Wolters Kluwer Health | Lippincott Williams & Wilkins