Chapter Fifteen
Frequency Distribution,
Cross-Tabulation,
and Hypothesis Testing
15-1
Chapter Outline
1) Overview
2) Frequency Distribution
3) Statistics Associated with Frequency Distribution
4) Introduction to Hypothesis Testing
5) A General Procedure for Hypothesis Testing
15-2
Chapter Outline
6) Cross-Tabulations
7) Statistics Associated with Cross-Tabulation
15-3
Chapter Outline
8) Cross-Tabulation in Practice
9) Hypothesis Testing Related to Differences
10) Parametric Tests
11) Non-parametric Tests
15-4
Chapter Outline
12) Internet and Computer Applications
13) Focus on Burke
14) Summary
15) Key Terms and Concepts
15-5
Internet Usage Data
Respondent Sex Familiarity Internet Attitude Toward Usage of Internet
Number Usage Internet Technology Shopping Banking
1 1.00 7.00 14.00 7.00 6.00 1.00 1.00
2 2.00 2.00 2.00 3.00 3.00 2.00 2.00
3 2.00 3.00 3.00 4.00 3.00 1.00 2.00
4 2.00 3.00 3.00 7.00 5.00 1.00 2.00 5 1.00 7.00 13.00 7.00 7.00 1.00 1.00
6 2.00 4.00 6.00 5.00 4.00 1.00 2.00
7 2.00 2.00 2.00 4.00 5.00 2.00 2.00
8 2.00 3.00 6.00 5.00 4.00 2.00 2.00
9 2.00 3.00 6.00 6.00 4.00 1.00 2.00
10 1.00 9.00 15.00 7.00 6.00 1.00 2.00
11 2.00 4.00 3.00 4.00 3.00 2.00 2.00
12 2.00 5.00 4.00 6.00 4.00 2.00 2.00
13 1.00 6.00 9.00 6.00 5.00 2.00 1.00
14 1.00 6.00 8.00 3.00 2.00 2.00 2.00
15 1.00 6.00 5.00 5.00 4.00 1.00 2.00
16 2.00 4.00 3.00 4.00 3.00 2.00 2.00
17 1.00 6.00 9.00 5.00 3.00 1.00 1.00
18 1.00 4.00 4.00 5.00 4.00 1.00 2.00
19 1.00 7.00 14.00 6.00 6.00 1.00 1.00
20 2.00 6.00 6.00 6.00 4.00 2.00 2.00
21 1.00 6.00 9.00 4.00 2.00 2.00 2.00
22 1.00 5.00 5.00 5.00 4.00 2.00 1.00
23 2.00 3.00 2.00 4.00 2.00 2.00 2.00
24 1.00 7.00 15.00 6.00 6.00 1.00 1.00
25 2.00 6.00 6.00 5.00 3.00 1.00 2.00
26 1.00 6.00 13.00 6.00 6.00 1.00 1.00
27 2.00 5.00 4.00 5.00 5.00 1.00 1.00
28 2.00 4.00 2.00 3.00 2.00 2.00 2.00
29 1.00 4.00 4.00 5.00 3.00 1.00 2.00
30 1.00 3.00 3.00 7.00 5.00 1.00 2.00
Table 15.1
15-6
Frequency Distribution
15-7
Frequency Distribution of Familiarity�with the Internet
Table 15.2
15-8
Frequency Histogram
Figure 15.1
2
3
4
5
6
7
0
7
4
3
2
1
6
5
Frequency
Familiarity
8
15-9
Statistics Associated with Frequency Distribution�Measures of Location
Where,
Xi = Observed values of the variable X
n = Number of observations (sample size)
X
=
X
i
/
n
Σ
i
=
1
n
X
15-10
Statistics Associated with Frequency Distribution�Measures of Location
15-11
Statistics Associated with Frequency Distribution�Measures of Variability
15-12
Statistics Associated with Frequency Distribution�Measures of Variability
s
x
=
(
X
i
-
X
)
2
n
-
1
Σ
i
=
1
n
15-13
Statistics Associated with Frequency Distribution�Measures of Shape
15-14
Skewness of a Distribution
Figure 15.2
Skewed Distribution
Symmetric Distribution
Mean Median Mode (a)
Mean Median Mode (b)
15-15
Steps Involved in Hypothesis Testing
Fig. 15.3
Draw Marketing Research Conclusion
Formulate H0 and H1
Select Appropriate Test
Choose Level of Significance
Determine Probability Associated with Test Statistic
Determine Critical Value of Test Statistic TSCR
Determine if TSCR falls into (Non) Rejection Region
Compare with Level of Significance, α
Reject or Do not Reject H0
Collect Data and Calculate Test Statistic
15-16
A General Procedure for Hypothesis Testing�Step 1: Formulate the Hypothesis
15-17
A General Procedure for Hypothesis Testing�Step 1: Formulate the Hypothesis
H
1
:
π
>
0
.
40
15-18
A General Procedure for Hypothesis Testing�Step 1: Formulate the Hypothesis
H
0
:
π
=
0
.
4
0
H
1
:
π
≠
0
.
4
0
15-19
A General Procedure for Hypothesis Testing�Step 2: Select an Appropriate Test
where
15-20
A General Procedure for Hypothesis Testing�Step 3: Choose a Level of Significance
Type I Error
Type II Error
15-21
A General Procedure for Hypothesis Testing�Step 3: Choose a Level of Significance
Power of a Test
15-22
Probabilities of Type I & Type II Error
Figure 15.4
99% of Total Area
Critical Value of Z
Π0= 0.40
Π = 0.45
β = 0.01
= 1.645
Z α
= -2.33
Z
β
Z
Z
95% of Total Area
α = 0.05
15-23
Probability of z with a One-Tailed Test
Unshaded Area
= 0.0301
Fig. 15.5
Shaded Area
= 0.9699
z = 1.88
0
15-24
A General Procedure for Hypothesis Testing�Step 4: Collect Data and Calculate Test Statistic
=
= 0.089
15-25
A General Procedure for Hypothesis Testing�Step 4: Collect Data and Calculate Test Statistic
The test statistic z can be calculated as follows:
= 0.567-0.40
0.089
= 1.88
15-26
A General Procedure for Hypothesis Testing�Step 5: Determine the Probability (Critical Value)
15-27
A General Procedure for Hypothesis Testing�Steps 6 & 7: Compare the Probability (Critical Value) and Making the Decision
∏
15-28
A General Procedure for Hypothesis Testing�Steps 6 & 7: Compare the Probability (Critical Value) and Making the Decision
15-29
A General Procedure for Hypothesis Testing�Step 8: Marketing Research Conclusion
15-30
A Broad Classification of Hypothesis Tests
Median/ Rankings
Distributions
Means
Proportions
Figure 15.6
Tests of Association
Tests of Differences
Hypothesis Tests
15-31
Cross-Tabulation
15-32
Gender and Internet Usage
Table 15.3
15-33
Two Variables Cross-Tabulation
15-34
Internet Usage by Gender
Table 15.4
15-35
Gender by Internet Usage
Table 15.5
15-36
Introduction of a Third Variable in Cross-Tabulation
Refined Association between the Two Variables
No Association between the Two Variables
No Change in the Initial Pattern
Some Association between the Two Variables
Fig. 15.7
Some Association between the Two Variables
No Association between the Two Variables
Introduce a Third Variable
Introduce a Third Variable
Original Two Variables
15-37
Three Variables Cross-Tabulation�Refine an Initial Relationship
As shown in Figure 15.7, the introduction of a third
variable can result in four possibilities:
15-38
Purchase of Fashion Clothing by Marital Status
Table 15.6
15-39
Purchase of Fashion Clothing by Marital Status
Table 15.7
15-40
Three Variables Cross-Tabulation�Initial Relationship was Spurious
15-41
Ownership of Expensive Automobiles by Education Level
Table 15.8
15-42
Ownership of Expensive Automobiles by Education Level and Income Levels
Table 15.9
15-43
Three Variables Cross-Tabulation�Reveal Suppressed Association
15-44
Desire to Travel Abroad by Age
Table 15.10
15-45
Desire to Travel Abroad by Age and Gender
Table 15.11
15-46
Three Variables Cross-Tabulations�No Change in Initial Relationship
15-47
Eating Frequently in Fast-Food �Restaurants by Family Size
Table 15.12
15-48
Eating Frequently in Fast Food-Restaurants�by Family Size & Income
Table 15.13
15-49
Statistics Associated with Cross-Tabulation�Chi-Square
15-50
Chi-square Distribution
Figure 15.8
Reject H0
Do Not Reject H0
Critical
Value
χ 2
15-51
Statistics Associated with Cross-Tabulation�Chi-Square
where nr = total number in the row
nc = total number in the column
n = total sample size
15-52
Statistics Associated with Cross-Tabulation�Chi-Square
For the data in Table 15.3, the expected frequencies for
the cells going from left to right and from top to
bottom, are:
Then the value of is calculated as follows:
15-53
Statistics Associated with Cross-Tabulation�Chi-Square
For the data in Table 15.3, the value of is
calculated as:
= (5 -7.5)2 + (10 - 7.5)2 + (10 - 7.5)2 + (5 - 7.5)2
7.5 7.5 7.5 7.5
=0.833 + 0.833 + 0.833+ 0.833
= 3.333
15-54
Statistics Associated with Cross-Tabulation�Chi-Square
15-55
Statistics Associated with Cross-Tabulation�Phi Coefficient
15-56
Statistics Associated with Cross-Tabulation�Contingency Coefficient
15-57
Statistics Associated with Cross-Tabulation�Cramer’s V
or
15-58
Statistics Associated with Cross-Tabulation�Lambda Coefficient
15-59
Statistics Associated with Cross-Tabulation�Other Statistics
15-60
Cross-Tabulation in Practice
While conducting cross-tabulation analysis in practice, it is useful to
proceed along the following steps.
15-61
Hypothesis Testing Related to Differences
15-62
A Classification of Hypothesis Testing Procedures for Examining Differences
Independent Samples
Paired Samples
Independent Samples
Paired Samples
* Two-Group t test
* Z test
* Paired
t test
* Chi-Square
* Mann-Whitney
* Median
* K-S
* Sign
* Wilcoxon
* McNemar
Fig. 15.9
Hypothesis Tests
One Sample
Two or More Samples
One Sample
Two or More Samples
* t test
* Z test
* Chi-Square * K-S
* Runs
* Binomial
Parametric Tests (Metric Tests)
Non-parametric Tests (Nonmetric Tests)
15-63
Parametric Tests
15-64
Hypothesis Testing Using the t Statistic
15-65
Hypothesis Testing Using the t Statistic
15-66
One Sample�t Test
For the data in Table 15.2, suppose we wanted to test
the hypothesis that the mean familiarity rating exceeds
4.0, the neutral value on a 7 point scale. A significance
level of = 0.05 is selected. The hypotheses may be
formulated as:
H0:
< 4.0
> 4.0
=
= 1.579/5.385 = 0.293�
t = (4.724-4.0)/0.293 = 0.724/0.293 = 2.471
H1:
15-67
One Sample�t Test
The degrees of freedom for the t statistic to test the hypothesis about one mean are n - 1. In this case, �n - 1 = 29 - 1 or 28. From Table 4 in the Statistical Appendix, the probability of getting a more extreme value than 2.471 is less than 0.05 (Alternatively, the critical t value for 28 degrees of freedom and a significance level of 0.05 is 1.7011, which is less than the calculated value). Hence, the null hypothesis is rejected. The familiarity level does exceed 4.0.
15-68
One Sample�z Test
Note that if the population standard deviation was assumed to be known as 1.5, rather than estimated from the sample, a z test would be appropriate. In this case, the value of the z statistic would be:
where
= = 1.5/5.385 = 0.279
and
z = (4.724 - 4.0)/0.279 = 0.724/0.279 = 2.595
15-69
One Sample�z Test
15-70
Two Independent Samples�Means
2
(
(
2
1
1
1
2
2
2
2
1
1
2
1
2
)
)
−
+
−
+
−
=
∑
∑
=
=
n
n
X
X
X
X
s
n
n
i
i
i
i
or
s
2
=
(
n
1
-
1
)
s
1
2
+
(
n
2
-1)
s
2
2
n
1
+
n
2
-2
15-71
Two Independent Samples�Means
The standard deviation of the test statistic can be
estimated as:
The appropriate value of t can be calculated as:
The degrees of freedom in this case are (n1 + n2 -2).
15-72
Two Independent Samples�F Test
An F test of sample variance may be performed if it is
not known whether the two populations have equal
variance. In this case, the hypotheses are:
H0: 12 = 22
H1: 12 22
15-73
Two Independent Samples�F Statistic
The F statistic is computed from the sample variances
as follows
where
n1 = size of sample 1
n2 = size of sample 2
n1-1 = degrees of freedom for sample 1
n2-1 = degrees of freedom for sample 2
s12 = sample variance for sample 1
s22 = sample variance for sample 2
Using the data of Table 15.1, suppose we wanted to determine
whether Internet usage was different for males as compared to
females. A two-independent-samples t test was conducted. The
results are presented in Table 15.14.
15-74
Two Independent-Samples t Tests
Table 15.14
-
15-75
Two Independent Samples�Proportions
The case involving proportions for two independent samples is also
illustrated using the data of Table 15.1, which gives the number of
males and females who use the Internet for shopping. Is the
proportion of respondents using the Internet for shopping the
same for males and females? The null and alternative hypotheses
are:
A Z test is used as in testing the proportion for one sample.
However, in this case the test statistic is given by:
15-76
Two Independent Samples�Proportions
In the test statistic, the numerator is the difference between the
proportions in the two samples, P1 and P2. The denominator is
the standard error of the difference in the two proportions and is
given by
where
15-77
Two Independent Samples�Proportions
A significance level of = 0.05 is selected. Given the data of
Table 15.1, the test statistic can be calculated as:
= (11/15) -(6/15)
= 0.733 - 0.400 = 0.333
P = (15 x 0.733+15 x 0.4)/(15 + 15) = 0.567
= = 0.181
Z = 0.333/0.181 = 1.84
15-78
Two Independent Samples�Proportions
Given a two-tail test, the area to the right of the critical value is 0.025. Hence, the critical value of the test statistic is 1.96. Since the calculated value is less than the critical value, the null hypothesis can not be rejected. Thus, the proportion of users (0.733 for males and 0.400 for females) is not significantly different for the two samples. Note that while the difference is substantial, it is not statistically significant due to the small sample sizes (15 in each group).
15-79
Paired Samples
The difference in these cases is examined by a paired samples t
test. To compute t for paired samples, the paired difference
variable, denoted by D, is formed and its mean and variance
calculated. Then the t statistic is computed. The degrees of
freedom are n - 1, where n is the number of pairs. The relevant
formulas are:
continued…
15-80
Paired Samples
where,
In the Internet usage example (Table 15.1), a paired t test could
be used to determine if the respondents differed in their attitude
toward the Internet and attitude toward technology. The resulting
output is shown in Table 15.15.
15-81
Paired-Samples t Test
Number
Standard
Standard
Variable
of Cases
Mean
Deviation
Error
Internet Attitude
30
5.167
1.234
0.225
Technology Attitude
30
4.100
1.398
0.255
Difference = Internet
-
Technology
Difference
Standard
Standard
2
-
tail
t
Degrees of
2
-
tail
Mean
deviat
ion
error
Correlation
prob.
value
freedom
probability
1.067
0.828
0
.1511
0
.809
0
.000
7.059
29
0
.000
Table 15.15
15-82
Non-Parametric Tests
Nonparametric tests are used when the independent variables are nonmetric. Like parametric tests, nonparametric tests are available for testing variables from one sample, two independent samples, or two related samples.
15-83
Non-Parametric Tests�One Sample
Sometimes the researcher wants to test whether the
observations for a particular variable could reasonably
have come from a particular distribution, such as the
normal, uniform, or Poisson distribution.
The Kolmogorov-Smirnov (K-S) one-sample test
is one such goodness-of-fit test. The K-S compares the
cumulative distribution function for a variable with a
specified distribution. Ai denotes the cumulative
relative frequency for each category of the theoretical
(assumed) distribution, and Oi the comparable value of
the sample frequency. The K-S test is based on the
maximum value of the absolute difference between Ai
and Oi. The test statistic is
K
=
M
a
x
A
i
-
O
i
15-84
Non-Parametric Tests�One Sample
15-85
K-S One-Sample Test for�Normality of Internet Usage
Table 15.16
15-86
Non-Parametric Tests�One Sample
15-87
Non-Parametric Tests�Two Independent Samples
15-88
Non-Parametric Tests�Two Independent Samples
15-89
Mann-Whitney U - Wilcoxon Rank Sum W Test Internet Usage by Gender
Table 15.17
Sex
Mean Rank
Cases
Male
20.93
15
Female
10.07
15
Total
30
Corrected for ties
U
W
z
2
-
tailed
p
31.000
151.000
-
3.406
0.001
Note
U
= Mann
-
Whitney test statistic
W
= Wilcoxon W Statistic
z
= U transformed into a normally distributed
z
stat
istic.
15-90
Non-Parametric Tests�Paired Samples
15-91
Non-Parametric Tests�Paired Samples
15-92
Wilcoxon Matched-Pairs Signed-Rank Test�Internet with Technology
Table 15.18
15-93
A Summary of Hypothesis Tests�Related to Differences
Table 15.19
Contd.
15-94
A Summary of Hypothesis Tests�Related to Differences
Table 15.19 cont.
15-95
SPSS Windows
15-96
SPSS Windows
To select these procedures click:
Analyze>Descriptive Statistics>Frequencies
Analyze>Descriptive Statistics>Descriptives
Analyze>Descriptive Statistics>Explore
The major cross-tabulation program is CROSSTABS.
This program will display the cross-classification tables
and provide cell counts, row and column percentages,
the chi-square test for significance, and all the
measures of the strength of the association that have
been discussed.
To select these procedures click:
Analyze>Descriptive Statistics>Crosstabs
15-97
SPSS Windows
The major program for conducting parametric
tests in SPSS is COMPARE MEANS. This program can
be used to conduct t tests on one sample or
independent or paired samples. To select these
procedures using SPSS for Windows click:
Analyze>Compare Means>Means …
Analyze>Compare Means>One-Sample T Test …
Analyze>Compare Means>Independent- Samples T Test …
Analyze>Compare Means>Paired-Samples T Test …
15-98
SPSS Windows
The nonparametric tests discussed in this chapter can
be conducted using NONPARAMETRIC TESTS.
To select these procedures using SPSS for Windows
click:
Analyze>Nonparametric Tests>Chi-Square …
Analyze>Nonparametric Tests>Binomial …
Analyze>Nonparametric Tests>Runs …
Analyze>Nonparametric Tests>1-Sample K-S …
Analyze>Nonparametric Tests>2 Independent Samples …
Analyze>Nonparametric Tests>2 Related Samples …
15-99