1 of 23

Binomial distribution

To save and make a local (editable) copy, do: File, Make a copy. �

Slides developed by Mine Çetinkaya-Rundel of OpenIntro, modified by Leah Dorazio for use with AHSS.

The slides may be copied, edited, and/or shared via the CC BY-SA license

Some images may be included under fair use guidelines (educational purposes)

2 of 23

Recall the Binomial formula

If p represents probability of success, (1-p) represents probability of failure, n represents number of independent trials, and k represents number of successes

3 of 23

The Binomial distribution

e.g. If the probability of a severe lung condition for a smoker = 0.3, what is the distribution of number of cases of severe lung condition among 4 randomly chosen friends who smoke?�

Find the probabilities where k = 0, 1, 2, 3, 4 using the binomial formula for each value of k. Note that n and p are fixed.

4 of 23

The Binomial distribution (cont.)

e.g. If the probability of a severe lung condition for a smoker = 0.3, what is the distribution of number of cases of severe lung condition among 4 randomly chosen friends who smoke?�

Find the probabilities where k = 0, 1, 2, 3, 4 using the binomial formula for each value of k.

The entire distribution is defined below. Note that, correcting for rounding error, the probabilities must add

to 1.

5 of 23

The Binomial distribution (cont.)

Once the probabilities of each value are calculated using the binomial formula, a probability histogram can be drawn in order to visualize the distribution. Like any distribution, the binomial distribution has a mean and a standard deviation. �

6 of 23

The Binomial distribution (cont.)

Recall the formulas from the previous chapter for calculating mean and standard deviation of a probability distribution.

�

Fortunately, for the binomial distribution with parameters n and p, there exist short-cut formulas for finding the mean and standard deviation.

7 of 23

Mean or Expected value

A 2012 Gallup survey suggests that 26.2% of Americans are obese. Among a random sample of 100 Americans, how many would you expect to be obese?

8 of 23

Mean or Expected value

A 2012 Gallup survey suggests that 26.2% of Americans are obese. Among a random sample of 100 Americans, how many would you expect to be obese?

Easy enough, 100 x 0.262 = 26.2.

9 of 23

Mean or Expected value

A 2012 Gallup survey suggests that 26.2% of Americans are obese. Among a random sample of 100 Americans, how many would you expect to be obese?

Easy enough, 100 x 0.262 = 26.2.
Or more formally, µ = np = 100 x 0.262 = 26.2.
But this doesn't mean in every random sample of 100 people exactly 26.2 will be obese. In fact, that's not even possible. In some samples this value will be less, and in others more. How much would we expect this value to vary?

10 of 23

Mean and Standard deviation of a binomial distribution

Going back to the obesity rate:

_________

Note: Mean and standard deviation of a binomial might not always be whole numbers, and that is alright, these values represent what we would expect to see on average.

We would expect 26.2 out of 100 randomly sampled Americans to be obese, with a standard deviation of 4.4.

11 of 23

Unusual observations

Using the notion that observations that are more than 2 standard deviations away from the mean are considered unusual and the mean and the standard deviation we just computed, we can calculate a range for the plausible number of obese Americans in random samples of 100.

26.2 ± (2 x 4.4) → (17.4, 35.0)

12 of 23

Practice

An August 2012 Gallup poll suggests that 13% of Americans think home schooling provides an excellent education for children. Would a random sample of 1,000 Americans in which only 100 share this opinion be considered unusual?

(a) Yes (b) No

http://www.gallup.com/poll/156974/private-schools-top-marks-educating-children.aspx

13 of 23

Practice

An August 2012 Gallup poll suggests that 13% of Americans think home schooling provides an excellent education for children. Would a random sample of 1,000 Americans where only 100 share this opinion be considered unusual?

(a) Yes (b) No

http://www.gallup.com/poll/156974/private-schools-top-marks-educating-children.aspx

14 of 23

Distributions of number of successes

Hollow histograms of samples from the binomial model �where p = 0.10 and n = 10, 30, 100, and 300. �What happens as n increases?

Note: the scales on the histograms are different!

See this applet with sliders for n and p to see how shape binomial distribution changes as n and p change:

http://www.stat.berkeley.edu/~stark/Java/Html/BinHist.htm

15 of 23

How large is large enough to use normal approximation?

The sample size is considered large enough if the expected number of successes and failures are both at least 10, that is, if �

np ≥ 10 and n(1-p) ≥ 10

Observe that when n= 30 and p = 0.10�np = 30 x 0.10 ≈ 3 < 10 (fail!)

n(1-p) = 30 x (1 - 0.1) = 27 ≥ 10�

But when n = 100 and p = 0.10

np = 100 x 0.10 ≈ 10 ≥ 10

n(1-p) = 100 x (1 - 0.1) = 90 ≥ 10

This is consistent with our visual judgement of normality.

16 of 23

Practice

Below are four pairs of Binomial distribution parameters. Which distribution can be approximated by the normal distribution?

n = 100, p = 0.95
n = 25, p = 0.45
n = 150, p = 0.05
n = 500, p = 0.015

Below are four pairs of Binomial distribution parameters. Which distribution(s) can be approximated by the normal distribution?�

n = 100, p = 0.95
n = 25, p = 0.45
n = 150, p = 0.05
n = 500, p = 0.015

17 of 23

Practice

Below are four pairs of Binomial distribution parameters. Which distribution can be approximated by the normal distribution?

n = 100, p = 0.95
n = 25, p = 0.45
n = 150, p = 0.05
n = 500, p = 0.015

Below are four pairs of Binomial distribution parameters. Which distribution(s) can be approximated by the normal distribution?

n = 100, p = 0.95
n = 25, p = 0.45 → 25 x 0.45 = 11.25, 25 x 0.55 = 13.75
n = 150, p = 0.05
n = 500, p = 0.015

18 of 23

An analysis of Facebook users

http://www.pewinternet.org/Reports/2012/Facebook-users/Summary.aspx

A recent study found that ``Facebook users get more than they give". For example:

40% of Facebook users in our sample made a friend request, but 63% received at least one request
Users in our sample pressed the like button next to friends' content an average of 14 times, but had their content ``liked" an average of 20 times
Users sent 9 personal messages, but received 12
12% of users tagged a friend in a photo, but 35% were themselves tagged in a photo

Any guesses for how this pattern can be explained?

19 of 23

An analysis of Facebook users

Power users contribute much more content than the typical user.

http://www.pewinternet.org/Reports/2012/Facebook-users/Summary.aspx

A recent study found that ``Facebook users get more than they give". For example:

40% of Facebook users in our sample made a friend request, but 63% received at least one request
Users in our sample pressed the like button next to friends' content an average of 14 times, but had their content ``liked" an average of 20 times
Users sent 9 personal messages, but received 12
12% of users tagged a friend in a photo, but 35% were themselves tagged in a photo

Any guesses for how this pattern can be explained?

20 of 23

Practice

P(X ≥ 70) = P(K = 70 or K = 71 or K = 72 or … or K = 245)� = P(K = 70) + P(K = 71) + P(K = 72) + … + P(K = 245)

This seems like an awful lot of work...

This study also found that approximately 25% of Facebook users are considered power users. The same study found that the average Facebook user has 245 friends. What is the probability that the average Facebook user with 245 friends has 70 or more friends who would be considered power users? Note any assumptions you must make.

We are given that n = 245, p = 0.25, and we are asked for the probability P(K ≥70). To proceed, we need independence, which we'll assume but could check if we had access to more Facebook data.

21 of 23

Normal approximation

to the binomial

22 of 23

Practice

What is the probability that the average Facebook user with 245 friends has 70 or more friends who would be considered power users?

P(Z > 1.29) = 0.0985

23 of 23

Explore more free resources at openintro.org/ahs s, including:

AHSS Textbook
Videos - content videos, worked examples, TI-84 and Casio tutorials
Slides
Data Sets
Desmos Activities
Interactive Tableau graphs
Statistical Software Labs
Discussion Forums (free support for students and teachers)

Teachers only content is also available for Verified Teachers, including

Exercise solutions
Sample exams
Ability to request a free desk copy for a course
Statistics Teachers email group

Questions? Contact us.