1 of 15

Baseball and Integration: the Effects of MLB Integration on the Negro Leagues

L. Colby Bogie

2 of 15

Abstract

In 1947, Jackie Robinson became the first African-American to play Major League Baseball. Before Robinson broke MLB’s color barrier, non-white baseball players competed in alternative professional baseball leagues, known collectively as the Negro Leagues. Using a Negro Leagues dataset from the website Retrosheet, I examined the Negro League seasons immediately before and immediately after Robinson’s debut in order to see how the integration of MLB affected Negro League baseball. I found that the scoring environment (total runs scored per game) remained relatively stable over these seasons, but attendance dropped precipitously in the 1948 and 1949 Negro League seasons.

3 of 15

Motivation

Major League Baseball recently announced that it was reclassifying certain Negro League seasons as “major league baseball,” opening the door for the statistics of non-white professionals from the early 20th century to take their place alongside the statistics of white players like Babe Ruth and Ty Cobb. MLB chose 1948 as the final season of Negro League baseball that would be considered “major league” quality. Some Negro Leagues continued to play for a few seasons after 1948, but during that time, more and more of the best black players were leaving the Negro Leagues to join MLB teams. My goal was to look at the statistical record of the Negro Leagues both before and after 1948 in order to determine whether or not there is in fact a noticeable difference in Negro League baseball before and after the integration of MLB.

4 of 15

Dataset(s)

I downloaded the Negro League datasets available at the baseball data website Retrosheet. Retrosheet is dedicated to recovering, preserving, and presenting as much accurate historical baseball data as possible. In the case of the Negro Leagues, record keeping was spotty, so this dataset contains many gaps and estimations, which I tried to filter out during my data cleaning. There’s a tremendous amount of granular, game-by-game data in the Retrosheet dataset, but for simplicity sake, I decided to look at two relatively simple pieces of data: total runs scored per regular season game and attendance per regular season game.

5 of 15

Data Preparation and Cleaning

This was by far the most time-consuming part of my project. I had to do the following:

  • First, I filtered the overall dataset into four smaller datasets containing the regular season games for four seasons (1946, 1947, 1948, and 1949).
  • The original dataset contained game-by-game run values for the home team and visiting team in each game; I had to add a new column for total runs by summing these two values.
  • Finally, in the attendance data, I had to drop rows with “NaN”, filter out characters such as “<” and “?”, convert the datatype from strings to ints, and filter out all games with a zero in the attendance column.

6 of 15

Research Question(s)

How did the run scoring environment (i.e. total runs per game) and average attendance in the Negro Leagues change in the seasons after Jackie Robinson broke the MLB color barrier in 1947?

7 of 15

Methods

Once I had cleaned the data and separated it into four regular season datasets for 1946, 1947, 1948, and 1949, I plotted the changes in run values and attendance using two methods: line plots of the average values and box-and-whisker plots that also showed the variability and distribution of the data.

8 of 15

Findings, slide 1: Average Runs Per Game

This chart shows that there was very little change in the overall run environment of the Negro Leagues in the seasons before and after the integration of MLB.

9 of 15

Findings, slide 2: Runs Per Game Box Plot

When it’s displayed as a box plot, the runs-per-game data seems even less variable over this time period, as a few outlier games in 1947 seem to be partly responsible for the apparent slight increase in scoring that season.

10 of 15

Findings, slide 3: Average Attendance Per Game

While runs per game didn’t change much over these seasons, attendance really did! Jackie Robinson debuted for the Dodgers halfway through the 1947 season; this seemed to correlate almost immediately with a marked decline in Negro League attendance.

11 of 15

Findings, slide 4: Attendance Per Game Box Plot

Finally, a box plot of the game-by-game attendance data shows a fuller picture of the attendance decline. In addition to the average decreasing, the highly attended outlier games at the upper range of the 1946 data rapidly became a thing of the past.

12 of 15

Limitations

There are several limitations here, especially on the attendance data, which is incomplete. After I filtered out the zeroes and “NaN” rows in the attendance column, my datasets shrank from about 400 rows to about 200 rows. Also, some of the 1948 attendance data was estimated, as the original entries included characters such as “<” and “?”, which I stripped from the data before performing my analysis. I suspect that the overall trend I showed here would remain true with even more complete data, but I can’t know that for sure.

The runs-per-game data seems to be more complete and more reliable, but it’s ultimately a relatively shallow way to evaluate a baseball season. A more in-depth analysis could have looked at strikeout rate, home run rate, and walk rate to draw a more complete picture of the offensive environment of the Negro League over these seasons.

13 of 15

Conclusions

In the seasons immediately before and immediately after Jackie Robinson’s MLB debut, I found that:

  • The total-runs-per-game in the Negro League regular season barely changed, suggesting that the overall level of play in the league remained relatively stable, at least by that (admittedly somewhat narrow) metric.
  • However, average attendance in Negro League regular-season games plummeted rapidly, suggesting that the integration of the American and National Leagues dramatically reduced the public’s interest in Negro League baseball.

14 of 15

Acknowledgements

This analysis was made possible by the hard work of the volunteer researchers at Retrosheet, who comb through old newspapers in order to create the most complete and most reliable statistical record of historical baseball.

I would also like to thank my wife, Amy Bogie, for her perspective and feedback as I worked through the project.

15 of 15

References

The Negro Leagues dataset I used can be downloaded from Retrosheet at the following address: https://www.retrosheet.org/NegroLeagues/NegroLeagues.html