1 of 27

Review Section

1

Data 6 Fall 2024

Pre-Quiz 2

Developed by UC Berkeley Data 6 Course Staff

2 of 27

Announcements

2

3 of 27

Control

3

4 of 27

Let’s simulate a game of plinko! Plinko is a game where you drop a ball down a series of pegs. Your score is dependent on where your ball ends up! Here is an image:

4

5 of 27

Suppose that a ball can end up in any one of five positions (1, 2, 3, 4, 5). Since it’s harder for a ball to end up in position 3, it should have the most points.

To start off, let’s calculate the score a ball should receive given its ending position. If the ball ends in position 1 or 5, the score should be zero. If the ball ends in position 2 or 4, the score should be one. If the ball ends in position 3, the score should be two.

5

6 of 27

6

7 of 27

Iteration

7

8 of 27

Playing the game

Now that we have a way of computing a score, let’s simulate one round of plinko. Implement plinko, which takes in a starting position and the number of layers. Each layer, the ball either moves left or right one from its current position (smaller positions are left, larger positions are right). This function returns the final position of the ball.

One caveat: If the current position of the ball is at 1 (the leftmost position) and we try moving left, or the current position of the ball is at 5 (the rightmost position) and we try moving right, the updated position should stay the same.

8

9 of 27

9

Hint: The second line of code in the while loop should restrict the current_position to be either 1 or 5. Use max/min. random.choice([-1, 1]) returns either -1 or 1 with equal probability.

10 of 27

10

11 of 27

Running the simulation!

Let’s simulate what the expected outcome is of playing a game of plinko. Implement run_simulation, which takes in a number n representing the number of simulations we should run and a depth d, and returns the average of all the scores for each game of plinko with a depth of d and a starting position of s.

11

12 of 27

Running the simulation!

12

13 of 27

Running the simulation!

13

14 of 27

Visualization, Group, Pivot

14

15 of 27

Creating the dataset

Now, create a dataset called tbl with the following information:

tbl should have four columns: Depth, Simulations, Average Score, and Starting Position.

There should be the following information in the table:

  • There should be 10,000 rows (entries) in the table
  • Each entry will have a depth of 3 and 100 simulations
  • The starting positions should be evenly distributed across all starting positions (from 1 to 5)

15

16 of 27

16

17 of 27

17

18 of 27

Exploring the Data

We wonder if the choice of starting position has an effect on the score we receive. Write some code that results in the following table:

18

19 of 27

Exploring the Data

We wonder if the choice of starting position has an effect on the score we receive. Write some code that results in the following table:

19

tbl.group("Starting Position", np.mean).select("Starting Position", "Average Score mean")

20 of 27

Pick the best visualization

What would be the best visualization to see the relationship between starting position and the average score?

  • Horizontal Bar Chart
  • Histogram
  • Line plot
  • Scatter plot

20

21 of 27

Pick the best visualization

What would be the best visualization to see the relationship between starting position and the average score?

  • Horizontal Bar Chart
  • Histogram
  • Line plot
  • Scatter plot

21

The choice of starting position is a categorical variable (not numerical), as arithmetic computations between starting positions carry no significant meaning. This rules out the usage of line plots and scatter plots.

Histograms are best suited for visualizing the distribution of one variable, not comparing two different ones. A horizontal bar chart is the best choice here.

22 of 27

Explaining the Data

Suppose we rerun the simulation, but with depth = 1, simulations = 100, and 10,000 rows in our dataset. What preliminary hypotheses might you draw about this visualization?

22

23 of 27

Explaining the Data

It seems like a starting position of 2, 3, or 4 is more favorable compared to a starting position of 1 or 5.

23

24 of 27

More Iteration

24

25 of 27

Summation

For this implementation of summation, briefly explain why this implementation is incorrect.

25

26 of 27

26

27 of 27

27