1 of 21

Descriptive Statistics

Measures of Association Between Two Variables

1

Business Analytics

Lecture # 06

2 of 21

TOPICS to be COVERED

01

Measures of Association Between Two Variables

02

Scatter Chart

03

Covariance

04

Correlation Coefficient

05

Data Cleansing

© 2016 Cengage Learning. All Rights Reserved. May not be copied, scanned, or duplicated, in whole or in part, except for use as permitted in a license distributed with a certain product or service or otherwise on a password-protected website for classroom use.

3 of 21

Measures of Association Between Two Variables

  • Often a manager or decision maker is interested in the relationship between two variables. In this section, we present covariance and correlation as descriptive measures of the relationship between two variables.
  • Scatter chart
  • Covariance
  • Correlation Coefficient

3

© 2016 Cengage Learning. All Rights Reserved. May not be copied, scanned, or duplicated, in whole or in part, except for use as permitted in a license distributed with a certain product or service or otherwise on a password-protected website for classroom use.

4 of 21

To illustrate these concepts,

  • we consider the case of the sales manager of Queensland Amusement Park, who is in charge of ordering bottled water to be purchased by park customers.
  • The sales manager believes that daily bottled water sales in the summer are related to the outdoor temperature.
  • Table 2.14 shows data for high temperatures and bottled water sales for 14 summer days. The data have been sorted by high temperature from lowest value to highest value.

4

© 2016 Cengage Learning. All Rights Reserved. May not be copied, scanned, or duplicated, in whole or in part, except for use as permitted in a license distributed with a certain product or service or otherwise on a password-protected website for classroom use.

5 of 21

5

© 2016 Cengage Learning. All Rights Reserved. May not be copied, scanned, or duplicated, in whole or in part, except for use as permitted in a license distributed with a certain product or service or otherwise on a password-protected website for classroom use.

6 of 21

Scatter Charts

  • A scatter chart is a useful graph for analyzing the relationship between two variables. Figure 2.26 shows a scatter chart for sales of bottled water versus the high temperature experienced on 14 consecutive days.

  • The scatter chart in the figure suggests that higher daily high temperatures are associated with higher bottled water sales. This is an example of a positive relationship, because when one variable (high temperature) increases, the other variable (sales of bottled water) generally also increases. The scatter chart also suggests that a straight line could be used as an approximation for the relationship between high temperature and sales of bottled water.

6

© 2016 Cengage Learning. All Rights Reserved. May not be copied, scanned, or duplicated, in whole or in part, except for use as permitted in a license distributed with a certain product or service or otherwise on a password-protected website for classroom use.

7 of 21

7

© 2016 Cengage Learning. All Rights Reserved. May not be copied, scanned, or duplicated, in whole or in part, except for use as permitted in a license distributed with a certain product or service or otherwise on a password-protected website for classroom use.

8 of 21

Covariance

  • Covariance is a descriptive measure of the linear association between two variables. For a sample of size n with the observations (x1 , y1), (x2 , y2 ), and so on, the sample covariance is defined as follows:

8

© 2016 Cengage Learning. All Rights Reserved. May not be copied, scanned, or duplicated, in whole or in part, except for use as permitted in a license distributed with a certain product or service or otherwise on a password-protected website for classroom use.

9 of 21

Figure 2.25: Calculating Covariance and Correlation Coefficient for Bottled Water Sales Using Excel

9

© 2016 Cengage Learning. All Rights Reserved. May not be copied, scanned, or duplicated, in whole or in part, except for use as permitted in a license distributed with a certain product or service or otherwise on a password-protected website for classroom use.

10 of 21

10

© 2016 Cengage Learning. All Rights Reserved. May not be copied, scanned, or duplicated, in whole or in part, except for use as permitted in a license distributed with a certain product or service or otherwise on a password-protected website for classroom use.

11 of 21

The covariance calculated is Sxy =12.8.

As the covariance is greater than 0, it indicates a positive relationship between the high temperature and sales of bottled water.

This verifies the relationship we saw in the scatter chart that as the high temperature for a day increases, sales of bottled water generally increase.

11

© 2016 Cengage Learning. All Rights Reserved. May not be copied, scanned, or duplicated, in whole or in part, except for use as permitted in a license distributed with a certain product or service or otherwise on a password-protected website for classroom use.

12 of 21

Correlation Coefficient

  • The correlation coefficient measures the relationship between two variables

12

© 2016 Cengage Learning. All Rights Reserved. May not be copied, scanned, or duplicated, in whole or in part, except for use as permitted in a license distributed with a certain product or service or otherwise on a password-protected website for classroom use.

13 of 21

13

© 2016 Cengage Learning. All Rights Reserved. May not be copied, scanned, or duplicated, in whole or in part, except for use as permitted in a license distributed with a certain product or service or otherwise on a password-protected website for classroom use.

14 of 21

Interpretation of Correlation Coefficient

–1 ≤ r ≤ +1

14

r value

Relationship between the x and y variables

< 0

Negative linear

Near 0

No linear relationship

> 0

Positive linear

© 2016 Cengage Learning. All Rights Reserved. May not be copied, scanned, or duplicated, in whole or in part, except for use as permitted in a license distributed with a certain product or service or otherwise on a password-protected website for classroom use.

15 of 21

Figure 2.26: Scatter Diagrams and Associated Covariance Values for Different Variable Relationships

  •  

15

 

 

© 2016 Cengage Learning. All Rights Reserved. May not be copied, scanned, or duplicated, in whole or in part, except for use as permitted in a license distributed with a certain product or service or otherwise on a password-protected website for classroom use.

16 of 21

Computation of Correlation Coefficient

  •  

16

© 2016 Cengage Learning. All Rights Reserved. May not be copied, scanned, or duplicated, in whole or in part, except for use as permitted in a license distributed with a certain product or service or otherwise on a password-protected website for classroom use.

17 of 21

  • For example, the scatter diagram in Figure 2.29 shows the relationship between the amount spent by a small retail store for environmental control (heating and cooling) and the daily high outside temperature for 100 consecutive days.

  • Figure 2.29 provides strong visual evidence of a nonlinear relationship. That is, we can see that as the daily high outside temperature increases, the money spent on environmental control first decreases as less heating is required and then increases as greater cooling is required.

17

© 2016 Cengage Learning. All Rights Reserved. May not be copied, scanned, or duplicated, in whole or in part, except for use as permitted in a license distributed with a certain product or service or otherwise on a password-protected website for classroom use.

18 of 21

18

© 2016 Cengage Learning. All Rights Reserved. May not be copied, scanned, or duplicated, in whole or in part, except for use as permitted in a license distributed with a certain product or service or otherwise on a password-protected website for classroom use.

19 of 21

Data Cleansing

  • The data in a data set are often said to be “dirty” and “raw” before they have been put into a form that is best suited for investigation, analysis, and modeling.

  • Data preparation makes heavy use of the descriptive statistics and data-visualization methods to gain an understanding of the data.

  • Common tasks in data preparation include treating missing data, identifying erroneous data and outliers, and defining the appropriate way to represent variables

19

© 2016 Cengage Learning. All Rights Reserved. May not be copied, scanned, or duplicated, in whole or in part, except for use as permitted in a license distributed with a certain product or service or otherwise on a password-protected website for classroom use.

20 of 21

  • Missing Data
  • Identification of Erroneous Outliers and Other Erroneous Values
  • Variable Representation

20

© 2016 Cengage Learning. All Rights Reserved. May not be copied, scanned, or duplicated, in whole or in part, except for use as permitted in a license distributed with a certain product or service or otherwise on a password-protected website for classroom use.

21 of 21

Thank You !

© 2016 Cengage Learning. All Rights Reserved. May not be copied, scanned, or duplicated, in whole or in part, except for use as permitted in a license distributed with a certain product or service or otherwise on a password-protected website for classroom use.