1 of 29

Randomness

and Simulation (II)

Data 6 Summer 2025

LECTURE 29

A quick dive into some statistics (with datascience)

Developed by students and faculty at UC Berkeley and Tuskegee University

data6.org/su25/syllabus/#acknowledgements-

2 of 29

Week 6

Announcements!

Week 5 Survey has been released and will be due 8/14 @ 11 PM
Final Project will be due 8/16 @ 11 PM
Lectures this week will only be from �10-11 AM
Utilize Office Hours in the run-up to the final exam on 8/15

Jedi’s OH changes to Tues-Fri, �8-10 PM on Zoom

Final Exam is on Friday 10-12 PM @ SOCS 170

Two pages for cheat sheet (8.5 x 11 inches, front and back)

Fill out course evaluations!
No tutoring this week

3 of 29

Today’s Roadmap

Lecture 29, Data 6 Summer 2025

Randomness
Simulation
Repeated Simulations

4 of 29

Randomness

1. Randomness Recapped

2. tbl.sample

3. Repeated Simulations

➤

5 of 29

What Is Randomness?

6 of 29

np.random

np.random is a submodule of numpy that contains functions involving random numbers and random selection. Here are some useful functions.

Function	Behavior
np.random.randint(start, stop, size)	Generates random integers between start and stop - 1. Each integer in the range is equally likely to be selected. If no size, then returns a single integer. If size (int) is provided, returns an array with size number of elements.
np.random.choice(arr, size)	Randomly selects elements from the array arr. Each element is equally likely to be selected. If no size, then returns a single element. If size (int) is provided, returns an array with size number of elements.
np.random.seed(n)	Sets the seed of the current cell.

7 of 29

tbl.sample

1. Randomness

2. tbl.sample

3. Repeated Simulations

➤

8 of 29

tbl.sample Overview

The method in the datascience library for sampling rows from a table. The table must exist before we can sample from it.

Function	Behavior
tbl.sample(k=tbl.num_rows, with_replacement=True, weights=None)	A new table where n rows are randomly sampled from the original table; by default, k=tbl.num_rows. Default is with replacement. For sampling without replacement, use argument with_replacement=False. For a non-uniform sample, provide a third argument weights=distribution where distribution is an array or list containing the probability of each row

9 of 29

tbl.sample() vs. np.random

tbl.sample is built on top of the np.random library!

This means that a lot of the np.random methods we learned in lecture 28 will interact nicely with tbl.sample
For instance np.random.seed will have the same behavior when interacting with tbl.sample

np.random primarily works with numbers and arrays.
tbl.sample only works with datascience Table objects

10 of 29

Choosing Our Sample Size

An important aspect of tbl.sample is how it let’s us choose how many rows to sample

The first argument k can be assigned to a int which will be the number of rows sampled
By default that number is the total number of rows in the table

Example: tbl.sample(10) samples 10 rows

11 of 29

With or Without Replacement

In some instances we may want to sample without replacement (i.e. an Simple Random Sample)

The 2nd argument with_replacement can be assigned to a boolean which will determine whether or not to sample with replacement
By default with_replacement=True

Example: tbl.sample(10, with_replacement=False) samples 10 rows without replacement
If your sample size is larger than your table and with_replacement is False your code will error as you run out of rows to sample

12 of 29