Video surveillance for road traffic monitoring with computer vision techniques
Team 2
Guillem Delgado, Jordi Gené, Francisco Roldan, Victor Segura
Motivation
ROAD SAFETY
Outline
1. Introduction
4
Objective: Track speed of a car or different cars and detect anomalies or speeding in the road
Track speed of vehicles
Track speed from POV
2. State of the art
Lidar[1]
5
Color Cameras[2]
Deep Learning[3]
[1] Chen, Xin, et al. "Next generation map making: geo-referenced ground-level LIDAR point clouds for automatic retro-reflective road feature extraction." Proceedings of the 17th ACM SIGSPATIAL International Conference on Advances in Geographic Information Systems. ACM, 2009.
[2] Yabo, Agustín, et al. "Vehicle classification and speed estimation using computer vision techniques." XXV Congreso Argentino de Control Automático (AADECA 2016)(Buenos Aires, 2016). 2016.
[3] Braham, Marc, and Marc Van Droogenbroeck. "Deep background subtraction with scene-specific convolutional neural networks." Systems, Signals and Image Processing (IWSSIP), 2016 International Conference on. IEEE, 2016.
3. Speed estimator using traditional CV techniques
Pipeline
6
Video Stabilization
Background subtraction
Car Tracking
Speed estimation
Video recording
3.1. Video Stabilization
Original
BM Stabilization
3.2. Background subtraction
8
DataSet | ColorSpace | A_filt | Connectivity | SE1 | SE2 | Alpha |
Highway | YCrCb | 160 | 8 | 11 | 9 | 0.23 |
Fall | YCrCb | 160 | 8 | 3 | 9 | 0.5 |
Traffic | RGB | 160 | 8 | 3 | 13 | 1.5 |
Gaussian Modelling
Area filtering
Dilation
(SE1)
Hole filling
(connectivity)
Erosion
(SE1)
Opening
(SE2)
*SE1 and SE2 are the structural element sizes used in the morphological operators.
3.3. Tracking
Custom SORT: Simple, online, and realtime tracking of multiple objects in a video sequence
Ref paper https://arxiv.org/pdf/1602.00763.pdf
SORT Implementation https://github.com/abewley/sort
Detection
Estimation model
Data association
Background subtraction: filter by area, applying mathematical morphology using erosions and dilations with specific connectivity on structuring elements to remove noise on background
From each detection, we get the centroid of bounding box, and compute the scale/area and aspect ratio to obtain the state of the target
[x1, y1, x2, y2] → [x, y, s, r]
Each person detected is associated to specific tracker computing IoU distance between detection and all predicted bounding boxes from the existing targets
Kalman filter is used to predict bounding boxes position from targets and track people detected
where:
s is scale/area
r aspect ratio
x horizontal coordinates
y vertical coordinates
3.3. Tracking
Other methods: Meanshift key idea: locate the maxima of a density function given discrete data sampled
Opencv reference: https://docs.opencv.org/3.4.1/db/df8/tutorial_py_meanshift.html
Other interesting source: http://www.bogotobogo.com/python/OpenCV_Python/python_opencv3_mean_shift_tracking_segmentation.php
Detection
Set up detections
Predict positions
Background subtraction: filter by area, applying mathematical morphology using erosions and dilations with specific connectivity on structuring elements to remove noise on background
Get bounding box location of object detected as car
Transform detection from RGB to HSV color space
Compute normalized histogram
Compute back projection using histogram computed on set up stage, and compute meanshift algorithm to predict new location
Pros:
Cons:
3.4. Speed estimation
11
Traffic stabilized
During the tracking, we can calculate the displacement (D) of an object between two different frames. This displacement (D) is is known in pixels.
In order to obtain the velocity, it is needed the correspondence between pixels and real distance. In the Traffic dataset, we have assumed that 36 pixels in the image correspond to 1m. Having the frame rate (fr=20Hz), the pixel-Distance correspondence (pxD=36pixels/m) and the displacement of an object between two frames (D[pixels]). The velocity of an object can be computed as:
Where 3.6 is de conversion factor to convert from m/s to km/h
3.4. Speed estimation
12
In our implementation, we have assumed that there isn’t distortion due to the projection. To take this distortion in account, we should to apply an homography before speed estimation (idea form Team2Class2016).
Assumptions:
width and height of the road
pixels to meters
Planar homography:
projection of the image into a reference plane.
Slide credit: Team2 Class2016
3.5. Anomalies detection. Application
13
Hypothesis: When there is a road anomaly, the speed of the cars will not remain constant, so that the standard deviation of the speed data of each car will be higher.
4. Speed estimation using Deep Learning
14
Data Preprocessing
CNN
Data preparation
Speed Prediction
Pairs of consecutive frames are created and shuffled for training purposes.
* Usage of same saturation value for both frames.
* Crop upper and bottom parts of the image and resize.
* Farneback’s Dense Optical Flow
NVIDIA model for end-to-end self autonomous driving.
4. Speed estimation using Deep Learning
15
Data Preprocessing
CNN
Data preparation
Speed Prediction
Pairs of consecutive frames are created and shuffled for training purposes.
* Usage of same saturation value for both frames.
* Crop upper and bottom parts of the image and resize.
* Farneback’s Dense Optical Flow
NVIDIA model for end-to-end self autonomous driving.
NvidiaModel
10 m/s
4. Speed estimation using Deep Learning
16
Bojarski, M., Del Testa, D., Dworakowski, D., Firner, B., Flepp, B., Goyal, P., ... & Zhang, X. (2016). End to end learning for self-driving cars. arXiv preprint arXiv:1604.07316.
5. Datasets
17
Driving Point of View
Traffic Dataset
Own dataset, recorded from POV at 1080p@30fps, reduced to 512x512@20 fps for the neural network
6. Results
18
Highway
Fall
Traffic
Traffic stabilized
Anomaly detector did not work as intended for our own dataset so we cannot provide any qualitative results
6. Results
19
Generally it detects when the car is increasing/decreasing speed. However, it struggles a lot when dealing repetitive scenes like highways, as many pixels remain the same in these scenarios. Moreover, when the car is stationary the model is sensitive to other vehicles movement. Probably the different hardware used and the camera placement are the determinant factor for the difference in performance between one sample and the other.
Jovan Sardinha validation video
7. Conclusions
20