Review Section
1
Data 6 Fall 2024
Pre-Quiz 2
Developed by UC Berkeley Data 6 Course Staff
Announcements
2
Control
3
Let’s simulate a game of plinko! Plinko is a game where you drop a ball down a series of pegs. Your score is dependent on where your ball ends up! Here is an image:
4
Suppose that a ball can end up in any one of five positions (1, 2, 3, 4, 5). Since it’s harder for a ball to end up in position 3, it should have the most points.
To start off, let’s calculate the score a ball should receive given its ending position. If the ball ends in position 1 or 5, the score should be zero. If the ball ends in position 2 or 4, the score should be one. If the ball ends in position 3, the score should be two.
5
6
Iteration
7
Playing the game
Now that we have a way of computing a score, let’s simulate one round of plinko. Implement plinko, which takes in a starting position and the number of layers. Each layer, the ball either moves left or right one from its current position (smaller positions are left, larger positions are right). This function returns the final position of the ball.
One caveat: If the current position of the ball is at 1 (the leftmost position) and we try moving left, or the current position of the ball is at 5 (the rightmost position) and we try moving right, the updated position should stay the same.
8
9
Hint: The second line of code in the while loop should restrict the current_position to be either 1 or 5. Use max/min. random.choice([-1, 1]) returns either -1 or 1 with equal probability.
10
Running the simulation!
Let’s simulate what the expected outcome is of playing a game of plinko. Implement run_simulation, which takes in a number n representing the number of simulations we should run and a depth d, and returns the average of all the scores for each game of plinko with a depth of d and a starting position of s.
11
Running the simulation!
12
Running the simulation!
13
Visualization, Group, Pivot
14
Creating the dataset
Now, create a dataset called tbl with the following information:
tbl should have four columns: Depth, Simulations, Average Score, and Starting Position.
There should be the following information in the table:
15
16
17
Exploring the Data
We wonder if the choice of starting position has an effect on the score we receive. Write some code that results in the following table:
18
Exploring the Data
We wonder if the choice of starting position has an effect on the score we receive. Write some code that results in the following table:
19
tbl.group("Starting Position", np.mean).select("Starting Position", "Average Score mean")
Pick the best visualization
What would be the best visualization to see the relationship between starting position and the average score?
20
Pick the best visualization
What would be the best visualization to see the relationship between starting position and the average score?
21
The choice of starting position is a categorical variable (not numerical), as arithmetic computations between starting positions carry no significant meaning. This rules out the usage of line plots and scatter plots.
Histograms are best suited for visualizing the distribution of one variable, not comparing two different ones. A horizontal bar chart is the best choice here.
Explaining the Data
Suppose we rerun the simulation, but with depth = 1, simulations = 100, and 10,000 rows in our dataset. What preliminary hypotheses might you draw about this visualization?
22
Explaining the Data
It seems like a starting position of 2, 3, or 4 is more favorable compared to a starting position of 1 or 5.
23
More Iteration
24
Summation
For this implementation of summation, briefly explain why this implementation is incorrect.
25
26
27