1 of 20

Lecture 15

Sampling

DATA 8

Summer 2017

Slides created by John DeNero (denero@berkeley.edu), Ani Adhikari (adhikari@berkeley.edu), and Sam Lau (samlau95@berkeley.edu)

2 of 20

Announcements

3 of 20

Monty Hall Problem

4 of 20

Probability

5 of 20

Probability

  • Lowest value: 0
    • Chance of event that is impossible
  • Highest value: 1 (or 100%)
    • Chance of event that is certain

  • If an event has chance 70%, then the chance that it doesn’t happen is
    • 100% - 70% = 30%
    • 1 - 0.7 = 0.3

6 of 20

Equally Likely Outcomes

Assuming all outcomes are equally likely, the chance of an event A is:

number of outcomes that make A happen

P(A) = ---------------------------------------------------------------

total number of outcomes

(Demo)

7 of 20

Fraction of a Fraction

(Demo)

8 of 20

Multiplication Rule

Chance that two events A and B both happen

= P(A happens) x P(B happens given that A has happened)

  • The answer is less than or equal to each of the two chances being multiplied
  • The more conditions you have to satisfy, the less likely you are to satisfy them all

(Demo)

9 of 20

Addition Rule

If event A can happen in exactly one of two ways, then

P(A) = P(first way) + P(second way)

  • The answer is greater than or equal to the chance of each individual way

10 of 20

Example: At Least One Head

  • In 3 tosses:
    • Any outcome except TTT
    • P(TTT) = (½) x (½) x (½) = ⅛
    • P(at least one head) = 1 - P(TTT) = ⅞ = 87.5%

  • In 10 tosses:
    • 1 - (½)**10
    • 99.9%

(Demo)

11 of 20

Attendance

12 of 20

Sampling

13 of 20

Sampling

  • Deterministic sample:
    • Sampling scheme doesn’t involve chance

  • Probability sample:
    • Before the sample is drawn, you have to know the selection probability of every group of people in the population
    • Not all individuals have to have equal chance of being selected

(Demo)

14 of 20

Sample of Convenience

  • Example: sample consists of whoever walks by
  • Just because you think you’re sampling “at random”, doesn’t mean you are.
  • If you can’t figure out ahead of time
    • what’s the population
    • what’s the chance of selection, for each group in the population

then you don’t have a random sample

15 of 20

Distributions

16 of 20

Probability Distribution

  • Random quantity with various possible values

  • “Probability distribution”:
    • All the possible values of the quantity
    • The probability of each of those values

  • In some cases, the probability distribution can be worked out mathematically without ever generating �(or simulating) the random quantity

(Demo)

17 of 20

Empirical Distribution

  • Based on observations

  • Observations can be from repetitions of an experiment

  • “Empirical Distribution”
    • All observed values
    • The proportion of counts of each value

(Demo)

18 of 20

Large Random Samples

19 of 20

Law of Averages

If a chance experiment is repeated many times,

independently and under the same conditions,

then the proportion of times that an event occurs

gets closer to the theoretical probability of the event

As you increase the number of rolls of a die, the proportion of times you see the face with five spots gets closer to 1/6

(Demo)

20 of 20

Large Random Samples

If the sample size is large,

then the empirical distribution of a uniform random sample

resembles the distribution of the population,

with high probability