1 of 16

Auto-Pedestrians Crashes

Data Analysis

By: Savani Vaidya

2 of 16

The Dataset

  • The Data (from Kaggle)
  • Overview
    • Auto-pedestrian crashes in the United States from 2010 to 2018
  • Analysis and Visualizations
    • With R and Tableau
  • Attributes (15)
    • Crash Year
    • Crash Month
    • Crash Day
    • Time of Day
    • Day of Week
    • City or Township
    • Crash: Intersection → Intersection crash or not intersection crash
    • Crash: Hit-and-Run → Hit-and-run or not hit-and-run
    • Lighting Conditions
    • Weather Conditions
    • Speed Limit at Crash Site
    • Worst Injury in Crash
    • Party Type → Motor Vehicle, Motorcycle, etc
    • Person Age
    • Person Gender

3 of 16

Research Questions

  • ⭐ Is there a certain scenario where an auto-pedestrian accident is most likely to happen?
    • Specific weather condition, city/township, time of day, etc
  • When are hit-and-runs the most frequent?
  • Where are auto-pedestrian accidents most common?
  • Which variables are correlated and how?

**Broader implications will be discussed later

4 of 16

Data Cleaning

  • Cleaned data set with Excel
    • Re-formatted data to make it easier to use in R
  • Cleaned dataset with R
    • Removed rows with:
      • NA values
      • “Unknown”
      • “Other / Unknown”
      • “Uncoded & errors”
  • Ensured better, more accurate visualizations and analysis
  • R Code for Cleaning Data
  • Clean Dataset (csv)

5 of 16

Bubble graph shows the frequency of crashes based on weather conditions

Disclaimer: A significant part of the data was entered as unknown, so we have excluded this data.

It’s assumed that most crashes happen under suboptimal weather conditions (rain or snow). However, most crashes, based on this data, show that they occur most under clear skies.

6 of 16

This bar graph shows the frequency of crashes based on the day of the month.

One of our initial visualizations.

Nothing special can be interpreted from this graph.

Generally, the amount of crashes on any day of the month was about the same with little variability.

7 of 16

This bar graph shows the frequency of crashes based on the day of the week.

Again, nothing too special can be interpreted from this graph.

It can be assumed that the most traffic is on Friday while the least is on Sunday. The rest of the days of the week have similar traffic patterns.

8 of 16

This bar graph shows the frequency of crashes based on lighting conditions.

This is likely because there is the most traffic during the day.

Although it may be assumed that most crashes would occur when it’s darker, most actually happen during broad daylight.

9 of 16

This bar graph shows the frequency of crashes that result in the 5 types of possible injuries while illustrating how many of those were hit and runs and how many were not hit and runs.

Evidently, there are more not hit-and-run cases compared to hit-and-run.

Most crashes resulted in possible injury. It has resemblances to a Normal distribution, but skewed more to the left with more crashes having less severe injuries. As found before, more crashes happened at lower speed limits which helps to indicate why more crashes have smaller injuries.

10 of 16

This bar graph shows the frequency of crashes based on the speed limit.

Pedestrians are more likely to be in areas with lower speed limits. Further, speed limits of 25 are typically found near residential areas.

The most crashes happened where the speed limit was 25. Generally, they occurred at lower speed limits.

11 of 16

12 of 16

Amount of Crashes Based on Time of Day (cont.)

  • The top 5 time windows for the highest frequency of crashes are:
    • 6:00 PM - 7:00 PM
    • 3:00 PM - 4:00 PM
    • 5:00 PM - 6:00 PM
    • 7:00 PM - 8:00 PM
    • 4:00 PM - 5:00 PM
  • Majority of the crashes that happen, happen during the afternoon from 3:00 PM - 8:00 PM
    • One would assume that most of these crashes happen early in the morning or late at night
    • At the same time we can also assume that due to those hours accommodating the “end of the day” for most working individuals, there would be a lot of traffic which could be a reason for the high frequency of crashes

13 of 16

Pie charts on the left show how many crashes at intersections were hit and runs.

It is evident that even though most of the crashes were not hit and runs, it is interesting that there is a higher percentage of hit and runs in crashes that don’t happen at an intersection.

To be specific, 36.5% of hit and runs occurred in a non-intersection crash while only 35.5% of hit and runs occured in intersection crashes.

14 of 16

Bubble graphs shows crash frequency based on city or township.

The largest bubble is Detroit and it is quite significant due to the size of the bubble compared to the rest of the counties.

Based on this data, we can also assume that a lot of auto-pedestrian crashes happen in Detroit since it is a bustling city.

15 of 16

All R Code

16 of 16

Significance and Broader Implications

  • Most likely scenario of an auto-pedestrian crash occurring:
    • In theory, a male on a Friday in October in a 25 mph speed limit zone between 3PM- 8PM near a non-intersection area in daylight with clear skies
    • They are most likely to face the possibility of injury along with being in a hit-and-run case
      • Of course, this is a witty assumption, but it backed by our analysis
  • What Can This Data/Analysis Tell Us?
    • Where to put more security cameras and police
    • Which cities/townships should take more precautionary measures to prevent these crashes
    • Awareness for pedestrian safety