1 of 29

Randomness

and Simulation (II)

1

Data 6 Summer 2025

LECTURE 29

A quick dive into some statistics (with datascience)

Developed by students and faculty at UC Berkeley and Tuskegee University

data6.org/su25/syllabus/#acknowledgements-

2 of 29

Week 6

Announcements!

  • Week 5 Survey has been released and will be due 8/14 @ 11 PM
  • Final Project will be due 8/16 @ 11 PM
  • Lectures this week will only be from �10-11 AM
  • Utilize Office Hours in the run-up to the final exam on 8/15
    • Jedi’s OH changes to Tues-Fri, �8-10 PM on Zoom
  • Final Exam is on Friday 10-12 PM @ SOCS 170
    • Two pages for cheat sheet (8.5 x 11 inches, front and back)
  • Fill out course evaluations!
  • No tutoring this week

2

3 of 29

Today’s Roadmap

Lecture 29, Data 6 Summer 2025

  1. Randomness
  2. Simulation
  3. Repeated Simulations

3

4 of 29

Randomness

4

1. Randomness Recapped

2. tbl.sample

3. Repeated Simulations

5 of 29

What Is Randomness?

5

6 of 29

np.random

np.random is a submodule of numpy that contains functions involving random numbers and random selection. Here are some useful functions.

6

Function

Behavior

np.random.randint(start, stop, size)

Generates random integers between start and stop - 1.

Each integer in the range is equally likely to be selected.

If no size, then returns a single integer. If size (int) is provided, returns an array with size number of elements.

np.random.choice(arr, size)

Randomly selects elements from the array arr. Each element is equally likely to be selected. If no size, then returns a single element. If size (int) is provided, returns an array with size number of elements.

np.random.seed(n)

Sets the seed of the current cell.

7 of 29

tbl.sample

7

1. Randomness

2. tbl.sample

3. Repeated Simulations

8 of 29

tbl.sample Overview

The method in the datascience library for sampling rows from a table. The table must exist before we can sample from it.

8

8

Function

Behavior

tbl.sample(k=tbl.num_rows, with_replacement=True, weights=None)

A new table where n rows are randomly sampled from the original table; by default, k=tbl.num_rows.

Default is with replacement. For sampling without replacement, use argument with_replacement=False.

For a non-uniform sample, provide a third argument weights=distribution where distribution is an array or list containing the probability of each row

9 of 29

tbl.sample() vs. np.random

  • tbl.sample is built on top of the np.random library!
    • This means that a lot of the np.random methods we learned in lecture 28 will interact nicely with tbl.sample
    • For instance np.random.seed will have the same behavior when interacting with tbl.sample
  • np.random primarily works with numbers and arrays.
  • tbl.sample only works with datascience Table objects

9

10 of 29

Choosing Our Sample Size

  • An important aspect of tbl.sample is how it let’s us choose how many rows to sample
    • The first argument k can be assigned to a int which will be the number of rows sampled
    • By default that number is the total number of rows in the table
  • Example: tbl.sample(10) samples 10 rows

10

11 of 29

With or Without Replacement

  • In some instances we may want to sample without replacement (i.e. an Simple Random Sample)
    • The 2nd argument with_replacement can be assigned to a boolean which will determine whether or not to sample with replacement
    • By default with_replacement=True
  • Example: tbl.sample(10, with_replacement=False) samples 10 rows without replacement
  • If your sample size is larger than your table and with_replacement is False your code will error as you run out of rows to sample

11

12 of 29

Sampling Without Replacement Example

  • Let’s call our table below die. We have just run the code die.sample(6, False)

12

Face

1

2

3

4

5

6

Face

13 of 29

Sampling Without Replacement Example

  • Let’s call our table below die. We have just run the code die.sample(6, False)

13

Face

1

3

4

5

6

Face

2

14 of 29

Sampling Without Replacement Example

  • Let’s call our table below die. We have just run the code die.sample(6, False)

14

Face

1

3

4

6

Face

2

5

15 of 29

Sampling Without Replacement Example

  • Let’s call our table below die. We have just run the code die.sample(6, False)

15

Face

1

4

6

Face

2

5

3

16 of 29

Sampling Without Replacement Example

  • Let’s call our table below die. We have just run the code die.sample(6, False)

16

Face

4

6

Face

2

5

3

1

17 of 29

Sampling Without Replacement Example

  • Let’s call our table below die. We have just run the code die.sample(6, False)

17

Face

6

Face

2

5

3

1

4

18 of 29

Sampling Without Replacement Example

  • Let’s call our table below die. We have just run the code die.sample(6, False)

18

Face

Face

2

5

3

1

4

6

19 of 29

Sampling Without Replacement Example

  • Let’s call our table below die. We have just run the code die.sample(6, False)
  • The original table `die` was never modified

19

Face

Face

2

5

3

1

4

6

Face

1

2

3

4

5

6

20 of 29

Sampling With Weights

  • The third and final argument to .sample is weights
  • Weights should be an array or list containing the probability of each row
  • For instance a biased coin could have weights = [0.2, 0.8] implying tails has a likelihood of 80% rather than 20%
  • Weighted sampling is useful for complex and sometimes stratified probabilistic sampling techniques

20

21 of 29

Sampling With Weights Example

  • Let’s imagined we wanted to simulate a die that was weighted towards 5s and 6s
  • We could call die.sample(6, True, weights=[0, 0, 0.1, 0.1, 0.4, 0.4])

21

Face

5

5

6

3

6

5

Face

3

4

5

6

22 of 29

Questions?

22

23 of 29

Quick Check 1

Assume the table cards contains a standard deck of cards where each row is a card. Fill in the blanks to shuffle the deck of cards.

23

Quick Check

24 of 29

Repetition

24

1. Randomness Recapped

2. tbl.sample

3. Repeated Simulations (with datascience)

25 of 29

Coin Flips

Suppose you flip a fair coin 100 times. How many heads would you expect to see?

  • 50.
  • But 45 or 57 wouldn’t be that crazy.
  • 20? 95? That would be shocking 😱.

25

100x

54 heads

100x

43 heads

100x

51 heads

...

10,000 times

Idea:

  1. Flip a coin 100 times. Write down the number of heads. (now use .sample)
  2. Repeat step 1 many times – say, 10,000 times. (How?)
  3. Draw a histogram of the number of heads in each iteration.

26 of 29

In Conclusion…

26

27 of 29

Summary

  • tbl.sample randomly samples rows from a table
    • We can choose the number of rows with k
    • We can decide if it samples with_replacement=True or False. By default it is True
    • We can weight our sampling procedure using weights=distribution
  • tbl.sample vs. np.random
    • Both are useful for random processes! `tbl.sample` is build on `np.random`
    • `np.random` is better for arrays

27

Randomness underpins the field of statistics, which is an integral component of data science.

28 of 29

Recap

Next Time

28

  • .sample
  • Fun with Plotly

29 of 29

Week 6

Announcements!

  • Week 5 Survey has been released and will be due 8/14 @ 11 PM
  • Final Project will be due 8/16 @ 11 PM
  • Lectures this week will only be from �10-11 AM
  • Utilize Office Hours in the run-up to the final exam on 8/15
    • Jedi’s OH changes to Tues-Fri, �8-10 PM on Zoom
  • Final Exam is on Friday 10-12 PM @ SOCS 170
    • Two pages for cheat sheet (8.5 x 11 inches, front and back)
  • Fill out course evaluations!
  • No tutoring this week

29