1 of 50

Copyright © Cengage Learning. All rights reserved.

4

Statistics

1

2 of 50

Copyright © Cengage Learning. All rights reserved.

4.4

The Normal Distribution

2

3 of 50

Objectives

  • Find probabilities of the standard normal distribution

  • Find probabilities of a nonstandard normal distribution

  • Find the value of a normally distributed variable that will produce a specific probability

3

4 of 50

The Normal Distribution

Sets of data may exhibit various trends or patterns. Figure 4.100 shows a histogram of the weights of bags of corn chips.

Weights of bags of corn chips.

Figure 4.100

4

5 of 50

The Normal Distribution

Notice that most of the data are near the “center” and that the data taper off at either end. Furthermore, the histogram is nearly symmetric; it is almost the same on both sides.

This type of distribution (nearly symmetric, with most of the data in the middle) occurs quite often in many different situations.

5

6 of 50

The Normal Distribution

To study the composition of such distributions, statisticians have created an ideal bell–shaped curve describing a normal distribution, as shown in Figure 4.101.�

Before we can study the characteristics and applications of a normal distribution, we must make a distinction between different types of variables.

Normal distribution.

Figure 4.101

6

7 of 50

Discrete versus Continuous Variables

7

8 of 50

Discrete versus Continuous Variables

The number of children in a family is variable, because it varies from family to family. In listing the number of children, only whole numbers (0, 1, 2, and so on) can be used. In this respect, we are limited to a collection of discrete, or separate, values.

A variable is discrete if there are “gaps” between the�possible variable values. Consequently, any variable that involves counting is discrete.�

On the other hand, a person’s height or weight does not�have such a restriction.

8

9 of 50

Discrete versus Continuous Variables

When someone grows, he or she does not instantly go from 67 inches to 68 inches; a person grows continuously from 67 inches to 68 inches, attaining all possible values in between. For this reason, height is called a continuous variable.

A variable is continuous if it can assume any value in an interval of real numbers (see Figure 4.102).

Discrete versus continuous variables.

Figure 4.102

9

10 of 50

Discrete versus Continuous Variables

Consequently, any variable that involves measurement is continuous; someone might claim to be 67 inches tall and to weigh 152 pounds, but the true values might be 67.13157 inches and 151.87352 pounds.

Heights and weights are expressed (discretely) as whole numbers solely for convenience; most people do not have rulers or bathroom scales that allow them to obtain measurements that are accurate to ten or more decimal places!

10

11 of 50

Normal Distributions

11

12 of 50

Normal Distributions

The collection of all possible values that a discrete variable �can assume forms a countable set. For instance, we can list all the possible numbers of children in a family.

In contrast, a continuous variable will have an uncountable number of possibilities because it can assume any value in an interval.

For instance, the weights (a continuous variable) of bags of corn chips could be any value x such that 15.3 x 16.7.

12

13 of 50

Normal Distributions

When we sample a continuous variable, some values may occur more often than others.

As we can see in Figure 4.100, the weights are “clustered” near the center of the histogram, with relatively few located at either end.

Weights of bags of corn chips.

Figure 4.100

13

14 of 50

Normal Distributions

If a continuous variable has a symmetric distribution such that the highest concentration of values is at the center and the lowest is at both extremes, the variable is said to have a normal distribution and is represented by a smooth, continuous, bell–shaped curve like that in Figure 4.101.

Normal distribution.

Figure 4.101

14

15 of 50

Normal Distributions

The normal distribution, which is found in a wide variety of�situations, has two main qualities:

(1) the frequencies of the data points nearer the center or � “average” are increasingly higher than the frequencies � of data points far from the center, and

(2) the distribution is symmetric (one side is a mirror image � of the other).

Because of these two qualities, the mean, median, and mode of a normal distribution all coincide at the center of the distribution.

15

16 of 50

Normal Distributions

Just like any other collection of numbers, the spread of normal distribution is measured by its standard deviation.

It can be shown that for any normal distribution, slightly more than two–thirds of the data (68.26%) will lie within one standard deviation of the mean, 95.44% will lie within two standard deviations, and virtually all the data (99.74%) will lie within three standard deviations of the mean.

We know that μ (the Greek letter “mu”) represents the mean of a population and σ (the Greek letter “sigma”) represents the standard deviation of the population.

16

17 of 50

Normal Distributions

The spread of a normal distribution, with μ and σ used to represent the mean and standard deviation, is shown in Figure 4.103.

The spread of a normal distribution.

Figure 4.103

17

18 of 50

Example 1 – Analyzing the Dispersion of a Normal Distribution

The heights of a large group of people are assumed to be normally distributed. Their mean height is 66.5 inches, and the standard deviation is 2.4 inches. Find and interpret the intervals representing one, two, and three standard deviations of the mean.

Solution:

The mean is μ = 66.5,

and the standard deviation is σ = 2.4.

1. One standard deviation of the mean:

μ ± 1σ = 66.5 ± 1(2.4)

= 66.5 ± 2.4

18

19 of 50

Example 1 – Solution

= [64.1, 68.9]

Therefore, approximately 68% of the people are between � 64.1 and 68.9 inches tall.

2. Two standard deviations of the mean:

μ ± 2σ = 66.5 ± 2(2.4)

= 66.5 ± 4.8

= [61.7, 71.3]

Therefore, approximately 95% of the people are between

61.7 and 71.3 inches tall.

cont’d

19

20 of 50

Example 1 – Solution

3. Three standard deviations of the mean:

μ ± 3σ = 66.5 ± 3(2.42)

= 66.5 ± 7.2

= [59.3, 73.7]

Nearly all of the people (99.74%) are between 59.3 and � 73.7 inches tall.

cont’d

20

21 of 50

Probability, Area, and Normal Distributions

21

22 of 50

Probability, Area, and Normal Distributions

The relative frequency is really a type of probability. If 3 out of every 100 people have red hair, you could say that the relative frequency of red hair is (or 3%), or you could say that the probability of red hair p (x = red hair) is 0.03.

Therefore, to find out what percent of the people in a�population are taller than 73 inches, we need to find �p (x > 73), the probability that x is greater than 73, where x represents the height of a randomly selected person.

22

23 of 50

Probability, Area, and Normal Distributions

A sample space is the set S of all possible outcomes of a random experiment. Consequently, the probability of a sample space must always equal 1; that is, p (S) = 1 �(or 100%).

If the sample space S has a normal distribution, its outcomes and their respective probabilities can be represented by a bell curve.

When constructing a histogram, relative frequency density (rfd) was used to measure the heights of the rectangles.

23

24 of 50

Probability, Area, and Normal Distributions

Consequently, the area of a rectangle gave the relative frequency (percent) of data contained in an interval. In a similar manner, we can imagine a bell curve being a histogram composed of infinitely many “skinny” rectangles, as in Figure 4.104.

Symmetric, bell–shaped histogram.

Figure 4.104

24

25 of 50

Probability, Area, and Normal Distributions

For a normal distribution, the outcomes nearer the center of the distribution occur more frequently than those at either end; the distribution is denser in the middle and sparser at the extremes.

This difference in density is taken into account by consideration of the area under the bell curve; the center of the distribution is denser, contains more area, and has a higher probability of occurrence than the extremes.

Consequently, we use the area under the bell curve to represent the probability of an outcome. Because p (S) = 1, we define the entire area under the bell curve to equal 1.

25

26 of 50

Probability, Area, and Normal Distributions

Because a normal distribution is symmetric, 50% of the data will be greater than the mean, and 50% will be less. (The mean and the median coincide in a symmetric distribution).

Therefore, the probability of randomly selecting a number x greater than the mean is p (x > μ ) = 0.5, and that of selecting a number �x less than the mean�is p (x < μ ) = 0.5,�as shown in Figure 4.105.

Total probability equals 1 (or 100%).

Figure 4.105

26

27 of 50

Probability, Area, and Normal Distributions

To find the probability that a randomly selected number x is between two values (say a and b), we must determine the area under the curve from a to b; that is,

p(a < x < b) = area under the bell curve from x = a to x = b, as shown in Figure 4.106(a).

Regions under a bell curve.

Figure 4.106(a)

27

28 of 50

Probability, Area, and Normal Distributions

Likewise, the probability that x is greater than or less than any specific number is given by the area of the tail, as shown in Figure 4.106(b).

To find probabilities involving data that are normally distributed, we must find the area of the appropriate region under the bell curve.

Regions under a bell curve.

Figure 4.106(b)

28

29 of 50

The Standard Normal Distribution

29

30 of 50

The Standard Normal Distribution

All normal distributions share the following features: they are symmetric, bell shaped curves, and virtually all the data (99.74%) lie within three standard deviations of the mean.��Depending on whether the standard deviation is large or�small, the bell curve will be either flat and spread out or�peaked and narrow, as shown in Figure 4.107.

Large versus small standard deviation

Figure 4.107

30

31 of 50

The Standard Normal Distribution

To find the area under any portion of any bell curve, mathematicians have devised a means of comparing the proportions of any curve with the proportions of a special curve defined as “standard.”

To find probabilities involving normally distributed data, we utilize the bell curve associated with the standard normal distribution.

31

32 of 50

The Standard Normal Distribution

The standard normal distribution is the normal distribution whose mean is 0 and standard deviation is 1, as shown in Figure 4.108.

The standard normal distribution is also called the �z–distribution; we will always use the letter z to refer to the standard normal. By convention, we will use the letter x to refer to any other normal distribution.

The standard normal distribution (mean = 0, standard deviation = 1).

Figure 4.108

32

33 of 50

The Standard Normal Distribution

Tables have been developed for finding areas under the standard normal curve using the techniques of calculus. Graphing calculators will also give these areas.

We will use the Standard Normal Distribution table to find p(0 < z < z*), the probability that z is between 0 and a positive number z*, as shown in Figure 4.109(a).

Area found by using the body table

Figure 4.109(a)

33

34 of 50

The Standard Normal Distribution

The Standard Normal Distribution table is also known as the body table because it gives the probability of an interval located in the middle, or body, of the bell curve.

The tapered end of a bell curve is known as a tail.

34

35 of 50

The Standard Normal Distribution

To find the probability of a tail—that is, to find p (z > z*) or p (z < z*) where z* is a positive real number—subtract the probability of the corresponding body from 0.5, as shown in�Figure 4.109(b).

Area of a tail, found by subtracting the corresponding body area from 0.5

Figure 4.109(b)

35

36 of 50

Example 2 – Finding Probabilities of the Standard Normal Distribution

Find the following probabilities (that is, the areas), where z represents the standard normal distribution.

a. p(0 < z < 1.25)

b. p (z > 1.87)

Solution:

a. As a first step, it is always advisable to draw a picture of

the z–curve and shade in the desired area.

36

37 of 50

Example 2 – Solution

We will use the body table directly, because we are � working with a central area (see Figure 4.110).

A central region, or body.

cont’d

Figure 4.110

37

38 of 50

Example 2 – Solution

The z–numbers are located along the left edge and the � top of the table.

Locate the whole number and the first-decimal-place part � of the number (1.2) along the left edge; then locate the � second-decimal-place part of the number (0.05) along � the top.

The desired probability (area) is found at the intersection � of the row and column of the two parts of the z-number.

cont’d

38

39 of 50

Example 2 – Solution

Thus, p (0 < z < 1.25) = 0.3944, as shown in � Figure 4.111.

A portion of the body table.

Figure 4.111

cont’d

39

40 of 50

Example 2 – Solution

Hence, we could say that about 39% of the z-distribution� lies between z = 0 and z = 1.25.

b. To find the area of a tail, we subtract the corresponding � body area from 0.5, as shown in Figure 4.112.

Finding the area of a tail.

Figure 4.112

cont’d

40

41 of 50

Example 2 – Solution

Therefore, p (z > 1.87) = 0.5 – p(0 < z < 1.87)

= 0.5 – 0.4692

= 0.0308

cont’d

41

42 of 50

Converting to the Standard Normal

42

43 of 50

Converting to the Standard Normal

Weather forecasters in the United States usually report�temperatures in degrees Fahrenheit.

Consequently, if a temperature is given in degrees Celsius, most people convert it to Fahrenheit in order to judge how hot or cold it is. A similar situation arises when we are working with a normal distribution.

Suppose we know that a large set of data is normally distributed with a mean value of 68 and a standard deviation of 4. What percent of the data will lie between 65 and 73? We are asked to find p(65 < x < 73).

43

44 of 50

Converting to the Standard Normal

To find this probability, we must first convert the given normal distribution to the standard normal distribution and then look up the approximate z-numbers.

The body table applies to the standard normal �z-distribution. When we are working with any other normal distribution (denoted by X), we must first convert the x-distribution into the standard normal z-distribution. This conversion is done with the help of the following rule.

Given a number x, its corresponding z-number counts the number of standard deviations the number lies from the mean.

44

45 of 50

Converting to the Standard Normal

For example, suppose the mean and standard deviation of a normal distribution are μ = 68 and σ = 4. The z-number corresponding to x = 78 is

This implies that x = 78 lies two and one-half standard deviations above the mean, 68. Similarly, for x = 65,

Therefore, x = 65 lies three-quarters of a standard deviation below the mean, 68.

45

46 of 50

Converting to the Standard Normal

46

47 of 50

Example 5 – Finding a Probability of a Nonstandard Normal Distribution

Suppose a population is normally distributed with a mean�of 24.6 and a standard deviation of 1.3. What percent of the�data will lie between 25.3 and 26.8?

Solution:

We are asked to find�p(25.3 < x < 26.8), �the area of the region �shown in Figure4.119.

A strip.

Figure 4.119

47

48 of 50

Example 5 – Solution

Because we need to find the area of the strip between 25.3 and 26.8, we must find the body of each and subtract.�

Using the Conversion Formula z = (xμ)/σ with μ = 24.6 and σ = 1.3,we first convert x = 25.3 and x = 26.8 into their corresponding z–numbers.

Converting x = 25.3

Converting x = 26.8

cont’d

48

49 of 50

Example 5 – Solution

Therefore,

p(25.3 < x < 26.8) = p (0.54 < z < 1.69)

= p (0 < z < 1.69) – p (0 < z < 0.54)

= 0.4545 – 0.2054

using the body table

rounding off z–numbers to

two decimal places

cont’d

49

50 of 50

Example 5 – Solution

= 0.2491

Assuming a normal distribution, approximately 24.9% of the data will lie between 25.3 and 26.8.

cont’d

50