1 of 7

TRAFFIC COLLISION DATA ANALYSIS

TEAM:

JASRIN KAUR

JOSE V. LAL

NADIA ENHAILI

2 of 7

PROJECT STEPS

Project Definition

Data Collection

    • Data Understanding

Data Cleaning

    • Data Preparation

Modeling

    • Machine Learning Model
    • SMOTE

Data Visualization/ Presentation

    • Dashboard
    • Infographic

3 of 7

PROJECT DEFINITION

  • Analyze the main factors contributing to fatal/ non-fatal injuries caused by traffic collisions
  • Develop an interactive dashboard to display the findings

4 of 7

DATA COLLECTION/ CLEANING

  • National Collision Dataset provided by the Government of Canada open data portal
  • 290,000 observations and 40 attributes
    • Collision level data elements
    • Vehicle level data elements
    • Person level data elements
  • C_SEV (collision severity) attribute:
    • Fatality (4468)/ non-fatality (285373)
    • Imbalanced Dataset
  • Impute missing values:
    • Weighted random selection of values

5 of 7

MODELING/ RESULTS

  • Binary Classification Problem
  • Explore relationship between fatal/ non-fatal injuries and other attributes
    • Dependent variable is dichotomous (binary)
    • Logistic Regression model
  • Results & Next Steps:

  • Due to the imbalanced nature of the dataset, the minority class displays poor performance
  • Explore SMOTE, to synthesize new example for the minority class

6 of 7

DATA VISUALIZATION

  • Interactive dashboard on Tableau

https://public.tableau.com/views/TableauDashboardNCDB2017/Dashboard1?:language=en-US&publish=yes&:display_count=n&:origin=viz_share_link

7 of 7

NEXT STEPS

Data Collection

    • Fatal collisions – Toronto Police
    • Access using APIs

Data Manipulation

    • Identify locations of interest (Updated every 30 days)

Data Deployment

    • Android Application
    • Alert users based on proximity to location of interest