1 of 60

Chapter 7 The Central Limit Theorem

OPENSTAX STATISTICS

1

2 of 60

Objectives

By the end of this chapter, the student should be able to:

  • Recognize central limit theorem problems.
  • Classify continuous word problems by their distributions.
  • Apply and interpret the central limit theorem for means.
  • Apply and interpret the central limit theorem for sums.

2

3 of 60

The Central Limit Theorem is one of the most powerful and useful ideas in all of statistics.

4 of 60

The Central Limit Theorem

  • There are two alternative forms of the theorem, and both alternatives are concerned with drawing finite samples size n from a population with a known mean, μ, and a known standard deviation, σ.
  • The first alternative says that if we collect samples of size n with a "large enough n," calculate each sample's mean, and create a histogram of those means, then the resulting histogram will tend to have an approximate normal bell shape.
  • The second alternative says that if we again collect samples of size n that are "large enough," calculate the sum of each sample and create a histogram, then the resulting histogram will again tend to have a normal bell-shape.

4

5 of 60

The Central Limit Theorem (cont.)

  • In either case, it does not matter what the distribution of the original population is, or whether you even need to know it.
  • The important fact is that the distribution of sample means and the sums tend to follow the normal distribution.
  • The size of the sample, n, that is required in order to be "large enough" depends on the original population from which the samples are drawn (the sample size should be at least 30 or the data should come from a normal distribution).

5

6 of 60

The astounding result is that it does not matter what the distribution of the original population is, or whether you even need to know it. The important fact is that the distribution of sample means tend to follow the normal distribution.

7 of 60

Section 7.1

THE CENTRAL LIMIT THEOREM FOR SAMPLE MEANS (AVERAGES)

7

8 of 60

The Central Limit Theorem for �Sample Means (Averages)

  • Suppose X is a random variable with a distribution that may be known or unknown (it can be any distribution).
  • Using a subscript that matches the random variable, suppose:
    • μX = the mean of X
    • σX = the standard deviation of X

8

9 of 60

The Central Limit Theorem for �Sample Means (Averages), Cont.

  •  

9

10 of 60

The Central Limit Theorem focuses on the sampling distribution of means.

10

11 of 60

It is crucial that you understand the differences between a population distribution versus a sampling distribution of means.

11

12 of 60

The Difference Graphically

13 of 60

On the Population Distribution

  • Notice that the horizontal axis in the top panel is labeled X. These are the individual observations of the population.
  • This is the unknown distribution of the population values.
  • The graph is purposefully drawn all squiggly to show that it does not matter just how odd ball it really is.
  • Remember, we will never know what this distribution looks like, or its mean or standard deviation for that matter.

14 of 60

On the Sampling Distribution

  •  

15 of 60

Formula Comparison

  • The Central Limit Theorem goes even further and tells us the mean and standard deviation of this theoretical distribution.

16 of 60

The Significance

  •  

 

17 of 60

Example

  • An unknown distribution has a mean of 90 and a standard deviation of 15. Samples of size n = 25 are drawn randomly from the population. See the Excel spreadsheet.�a. Find the probability that the sample mean is between 85 and 92.
  • b. Find the value that is two standard deviations above the expected value, 90, of the sample mean.

17

18 of 60

Example - Answers

  •  

18

19 of 60

Example - Answers

  •  

19

20 of 60

Example

  • The length of time, in hours, it takes an "over 40" group of people to play one soccer match is normally distributed with a mean of two hours and a standard deviation of 0.5 hours. A sample of size n = 50 is drawn randomly from the population. Find the probability that the sample mean is between 1.8 hours and 2.3 hours. See the Excel spreadsheet.

20

21 of 60

Example - Answers

  •  

21

22 of 60

Example

In a recent study, it was reported that the mean age of iPad users is 34 years. Suppose the standard deviation is 15 years. Take a sample of size n = 100. See the Excel spreadsheet.

  1. What are the mean and standard deviation for the sample mean ages of iPad users?
  2. What does the distribution look like?
  3. Find the probability that the sample mean age is more than 30 years (the reported mean age of iPad users in this particular study).
  4. Find the 95th percentile for the sample mean age (to one decimal place).

22

23 of 60

Example - Answers

  •  

23

24 of 60

Section 7.2

THE CENTRAL LIMIT THEOREM FOR SUMS

24

25 of 60

The Central Limit Theorem for �Sample Means (Averages)

  •  

25

26 of 60

Example

An unknown distribution has a mean of 90 and a standard deviation of 15. A sample of size 80 is drawn randomly from the population.

Problem

  1. Find the probability that the sum of the 80 values (or the total of the 80 values) is more than 7,500.
  2. Find the sum that is 1.5 standard deviations above the mean of the sums.
  3. See the Excel spreadsheet.

26

27 of 60

Example - Answers

  •  

27

28 of 60

Example - Answers

  •  

28

29 of 60

Example

In a recent study, it was reported that the mean age of iPad users is 34 years. Suppose the standard deviation is 15 years. The sample of size is 50. See the Excel spreadsheet.

  1. What are the mean and standard deviation for the sum of the ages of iPad users? What is the distribution?
  2. Find the probability that the sum of the ages is between 1,500 and 1,800 years.
  3. Find the 80th percentile for the sum of the 50 ages.

29

30 of 60

Example - Answers

  •  

30

31 of 60

Section 7.3

USING THE CENTRAL LIMIT THEOREM

31

32 of 60

The Law of Large Numbers

  • The Law of Large Numbers, along with the Central Limit Theorem, provides another critical piece of information to allow us to engage in inferential statistics. In short, the Law of Large Numbers proves that the expected value of the sampling distribution of the sample mean is the population mean:

  • Suppose you were to take a sample and calculate a sample mean. Then you take another sample, combine it with the previous sample, and calculate the sample mean of the combined sample. Then you repeat this process over and over, creating bigger and bigger samples and calculating a sample mean each time along the way. We can use Excel to observe this (we will look at an experiment shortly).
  • The sample means from larger and larger samples will get closer and closer to the population mean, μ. The proof of the Law of Large Numbers mathematically was perfected during a period of 20 years and was presented by Jacob Bernoulli in 1713.

33 of 60

The Law of Large Numbers and The Central Limit Theorem

  •  

34 of 60

We can use Excel histograms and our simulation to see how the Central Limit Theorem works

35 of 60

The Central Limit Theorem works with any type of data distribution

HERE ARE SOME OTHER EXAMPLES

36 of 60

Examples with a Uniform Distribution

37 of 60

Examples with a Skewed Distribution

38 of 60

On the Sampling Distribution

  •  

39 of 60

Different Sample Sizes

  • To the right are three sampling distributions. The only change that was made is the sample size that was used to get the sample means for each distribution.
  • As the sample size increases, n goes from 10 to 30 to 50, the standard deviations of the respective sampling distributions decrease because the sample size is in the denominator of the standard deviations of the sampling distributions.

40 of 60

Example

  • A manufacturer produces 25-pound lifting weights. The lowest actual weight is 24 pounds, and the highest is 26 pounds. Each weight is equally likely so the distribution of weights is uniform. A sample of 100 weights is taken.
  • a. What is the distribution for the weights of one 25-pound lifting weight? What is the mean and standard deviation?
  • b. What is the distribution for the mean weight of 100 25-pound lifting weights?
  • c. Find the probability that the mean actual weight for the 100 weights is less than 24.9.

41 of 60

Example - Answers

  • a. What is the distribution for the weights of one 25-pound lifting weight? What is the mean and standard deviation?
  • U(24, 26), 25, 0.5774
  • b. What is the distribution for the mean weight of 100 25-pound lifting weights?
  • N(25, 0.0577)
  • c. Find the probability that the mean actual weight for the 100 weights is less than 24.9.
  • 0.0416

42 of 60

Example

  • A manufacturer produces 25-pound lifting weights. The lowest actual weight is 24 pounds, and the highest is 26 pounds. Each weight is equally likely so the distribution of weights is uniform. A sample of 100 weights is taken.
  • Find the probability that the mean actual weight for the 100 weights is greater than 25.2.

43 of 60

Example - Answers

  • Find the probability that the mean actual weight for the 100 weights is greater than 25.2.
  • 0.0003

44 of 60

Example

  • A manufacturer produces 25-pound lifting weights. The lowest actual weight is 24 pounds, and the highest is 26 pounds. Each weight is equally likely so the distribution of weights is uniform. A sample of 100 weights is taken.
  • Find the 90th percentile for the total weight of the 100 weights.

45 of 60

Example - Answers

  • Find the 90th percentile for the total weight of the 100 weights.
  • 2,507.40

46 of 60

Example

  • A uniform distribution has a minimum of six and a maximum of ten. A sample of 50 is taken.
  • Find the 80th percentile for the sums.

47 of 60

Example - Answers

  • A uniform distribution has a minimum of six and a maximum of ten. A sample of 50 is taken.
  • Find the 80th percentile for the sums.
  • 406.87

48 of 60

On Using Proportions

  • If the random variable is discrete, such as for categorical data, then the parameter we wish to estimate is the population proportion.
  • The probability of drawing a success in any one random draw.
  • Unlike the case just discussed for a continuous random variable where we did not know the population distribution of X's, here we actually know the underlying probability density function for these data; it is the binomial distribution. The random variable is X = the number of successes and the parameter we wish to know is p, the probability of drawing a success which is of course the proportion of successes in the population.

49 of 60

From the Normal Distribution to the Binomial Distribution

  •  

50 of 60

Once again, we can look at our simulation to see how this works with proportions

51 of 60

With Graphs

  •  

52 of 60

Formula Comparison for Proportions

  •  

53 of 60

Example

  • A company inspects products coming through its production process, and rejects detected products. One-tenth of the items are rejected. If samples of 50 items are taken, what is the standard deviation of the mean of the sampling distribution of sample proportions?

54 of 60

Example - Answers

  •  

55 of 60

Example

In the United States, a robbery occurs every two minutes, on average, according to a number of studies. Suppose the standard deviation is 0.5 minutes and the sample size is 100.

Problem

  1. Find the median, the first quartile, and the third quartile for the sample mean time of robberies in the United States.
  2. Find the median, the first quartile, and the third quartile for the sum of sample times of robberies in the United States.
  3. Find the probability that a robbery occurs on the average between 1.75 and 1.85 minutes.
  4. Find the value that is two standard deviations above the sample mean.
  5. Find the IQR for the sum of the sample times.

55

56 of 60

Example - Answers

  •  

56

57 of 60

Example - Answers

  •  

57

58 of 60

Example - Answers

  •  

58

59 of 60

Example

A study was done regarding attendance at Broadway shows in New York City. The age range of the attendees was 14 to 61. The mean age was 30.9 years with a standard deviation of nine years.

  1. In a sample of 25 attendees, what is the probability that the mean age is less than 35?
  2. Is it likely that the mean age of the sample group could be more than 50 years? Interpret the results.
  3. In a sample of 49 attendees, what is the probability that the sum of the ages is no less than 1,600?
  4. Is it likely that the sum of the ages of the 49 attendees is at most 1,595? Interpret the results.
  5. Find the 95th percentile for the sample mean age of 65 attendees. Interpret the results.
  6. Find the 90th percentile for the sum of the ages of 65 attendees. Interpret the results.

59

60 of 60

Example – Answers

  •  

60