1 of 20

GIDEON�Intelligent Video Analytics (IVA) Application�(with NVIDIA Jetson Nano as Target hardware)

Group members:

Rana Hassan Shafiq (SP16-BCS-190)

Abdul Wahab Asif (FA16-BCS-019)

Muhammad Maroof Ismail Khan Niazi (FA16-BCS-085)

Supervised By:

Dr. Usama Ijaz Bajwa

Assistant Prof. & Associate Head of Department�Dept. of Computer Science, CUI Islamabad, Lahore Campus

Machine Perception & Visual Intelligence Research Group

2 of 20

Introduction

Near real-time (25fps) detection of suspicious/anomalous human activities/events over live CCTV input feed

Embedded AI on the Edge, with Inference Engine, core application deployed on NVIDIA’s Jetson Nano I-GPU powered Platform

Computer Vision leveraged 3D-CNN based Deep Neural Network, trained over Google Cloud Platform Compute Engines, using Tesla K80, T4 GPUs

Automates intensive Surveillance work-load for single/multiple camera streams (up to 6 simultaneous)

Machine Perception & Visual Intelligence Research Group

3 of 20

Problem

# 1

# 3

# 2

Increased crime rate with growing population. Even with CCTVs, large number of events go undetected/delayed-reporting

Even with proper monitoring , crimes reamins undetected simply due to human error

Traditional CCTV setup requires constant monitoring, or large human resource base for CCTV clusters

# 4

For instance, 87,500+ cases reported only in Punjab ( Jan to Feb 2020). Therefore, Surveillance automation is a necessity than a convenience luxury

Machine Perception & Visual Intelligence Research Group

4 of 20

Our Solution: Gideon IVA

Feeds live video stream 720p @ 30fps to NVIDIA Jetson Nano powered host using MIPI CSI interface

Processed batches of frames are fed to the Pre-Trained DNN based Inference Engine to obtain probability outputs

IVA applies data preprocessing using OpenCV and python toolchain, prior feeding frames to inference engine

Network connected IVA uses RESTful API backend NodeJS server, delivering instant alerts on Web & Android User Interfaces

Probability outputs along with Threshold parameter, detect and extract key-event video snippet

Preprocessing

Anomaly Detection

Camera

User Alert

Inferencing Model

Machine Perception & Visual Intelligence Research Group

5 of 20

Tasks

1. Learning our Tool-stack - Keras, Tensorflow, Numpy, OpenCV, 3D-CNNs, GCP AI Compute Engines, Nvidia TensorRT, CuDNN, CUDA

2. Researching implementable solution in context of our target hardware: NVIDIA Jetson Nano

3. Dataset Preprocessing

4. Model architecture selection and training over GCP AI Computer Engines

5. Configuring our embedded AI target to host 3D-DNN model for real-time inference, using NVIDIA’s proprietary tool chain

6. Architectural adjustments in the model, complying memory-footprint requirements and inference accuracy

7. Retraining, hyper-parameters fine-tuning, post model porting to target

8. Inference on test dataset, retraining and hyper-parameters tuning based on performance analysis

9. RESTful NodeJS server development, MongoDB integration and Python HTTP client ,running Inference Engine core

10. ReactJS based Web and Android UI development and application testing.

Machine Perception & Visual Intelligence Research Group

6 of 20

Inference Engine�- Python HTTP Client application�- Hosted on Nvidia Jetson Nano�- Running DNN Inference Engine at core

NodeJS Backend Server�- RESTful API endpoint for backend services�- Integerated with MongoDB databse server�- Routing key events information to UIs

ReactJS Web UI�- Dynamic, Interactive Web Application for UI�- Server hosted on Nvidia Jetson Nano�- Real-time synchronization with backend

Android UI�- Dynamic, Interactive Web Application for UI�- Server hosted on Nvidia Jetson Nano�- Real-time synchronization with backend

Core System Modules

Machine Perception & Visual Intelligence Research Group

7 of 20

Dataset

UCF-Crime Dataset

1900 Videos

- 950 Normal

- 950 Abnormal

13 Categories:

- Crimes

- Anomalous Events

Training: 1330 Videos

Test: 570 Videos

Inference Engine - Dataset

Machine Perception & Visual Intelligence Research Group

8 of 20

Target Hardware: NVIDIA Jetson Nano

The embedded solution in our case is NVIDIA Jetson NANO; a 99 US$ Heterogeneous Computing platform with:

128 Maxwell Architecture based CUDA Cores
Quad core ARM Cortex A57 based application processor @ 1.43 GHz
4GB 64-bit LPDDR4 25.6 GB/s Shared RAM with GPU cores.
1 x MIPI CSI-2 DPHY lanes camera interface
10/100/1000BASE-T Ethernet
HDMI port with 1080p UHD output
4 x USB 3.0, 1 x USB 2.0 Micro-B
40 x GPIO, I2C, I2S, SPI, UART, microSD
Video Encoding: 4K @ 30fps (H.264/H.265)
Video Decoding: 4K @ 60fps (H.264/H.265)
Mechanical Form factor: 100mm x 80mm x 29mm.

Machine Perception & Visual Intelligence Research Group

9 of 20

Inference Engine - Architecture

Followed C3D based approach layer-wise, for a 3D-Convolutional Neural Network as Inference Engine Core
3 Layer-groups Architecture, implemented in Tensorflow (1.15) using Keras API
Trained over Google Cloud Platform based AI Compute Engines with NVIDIA Tesla K80, T4 GPUs
Current Validation Accuracy of 73% and Validation Loss of 1.32 (hyper-parameters improvement in progress)
Trained strategy included disk-resident frame extraction using python based directory/file parser and openCV for automated labelling of dataset (normal vs anbormal), feeding batch-size=16 to Network for network training

Machine Perception & Visual Intelligence Research Group

10 of 20

Inference Engine - Implementation

Live Camera feed is obtained using MIPI CSI-2 port with 720p @ 30fps, keeping the Video-decoding workload off the main CPU Cores (Quad-Core ARM Cortex A57 operating @ 1.43 GHz clock frequency)
Input stream frames are extracted using OpenCV, kept temporarily in Numpy Arrays
Preprocessing is applied, batching is performed and fed to Inference Engine (pre-trained model)
Inference Engine performs inference in near real-time (~25fps), outputting class-wise probability arrays per input batch
Threshold criteria is used to trigger events-detection and key-events video extraction to generate statistics and alert for End-User Web and Android Interfaces using network connectivity
Inference Engine is encapsulated inside a Python HTTP Client and hosted on target; NVIDIA Jetson Nano

Machine Perception & Visual Intelligence Research Group

11 of 20

NodeJS Backend Server

Implements a RESTful API based backend NodeJS server for CRUD operations
Provides separate endpoints for GET, POST, PUT and DELETE HTTP requests from both Python HTTP Client and Web/Android UIs
Integrated with MongoDB Database server, providing bulk dump for Events data storage, both for later usage and serving Web/Android UIs
Uses ExpressJS framework and middleware stack for the Request Processing Pipeline in stead of bare NodeJS impelementation
Cryptographic Authentication implemented in the form of JSON Tokens for user data protection and restricting unauthorized accesses/http requests

Machine Perception & Visual Intelligence Research Group

12 of 20

ReactJS Web Application UI

Implemented using ReactJS javascript library for building highly responsive, component-based, dynamic web application interfaces
Serves static content from the root of a separate web server, running on Jetson Nano, providing extracted images and key-events videos, to separate multimedia content from JSON//HTML content
AJAX Calls implemented using Axios.
Upon new event, page reload does not occur, instead DOM tree is manipulated dynamically to display the newly received block of information from NodeJS server
Lightweight to load, responsive to display updated information

Machine Perception & Visual Intelligence Research Group

13 of 20

ReactJS Web Application UI - Interface

Machine Perception & Visual Intelligence Research Group

14 of 20

Android Application UI - Interface

Home Screen

Machine Perception & Visual Intelligence Research Group

15 of 20

Android Application UI - Interface

Alert Demo

Machine Perception & Visual Intelligence Research Group

16 of 20

Android Application - Technology

Implements GET, POST, PUT, DELETE HTTP requests using Retrofit Library, a REST services based client for Android
Instant alert notification for key-events along with live stream of DNN Processed CCTV feed
Offers UI features as per Web UI and interacts with the same backend architecture using HTTP requests

Machine Perception & Visual Intelligence Research Group

17 of 20

Problems and Challenges

Memory Footprint Limitations	Model Accuracy	Google Cloud AI Compute Engines	Linux for Tegra (L4T) Port
NVIDIA Jetson Nano has a shared 4.0 GB of DDR3 RAM between CPU Cores and GPU. L4T OS occupying 1.5 GB for OS services. So inference engine runtime has to be scaled accordingly with repeated attempts.	Due to a scaled down version of C3D 3D-CNN implementation to match memory footprint for DNN Model, hyper-parameters had to be tuned repeatedly , until a performance to memory balance was achieved/	Compute requirements for Model training were exotic. Google Colab has shared, time-constrained GPU resource. So Google Cloud Platform services had to be discovered thoroughly going through Docs, exhaustively.	Due to Linux port for ARM architecture, most of the mainstream libraries and frameworks, were not deployable and had to be build from source or alternatives had to be learnt and port the code accordingly.

Machine Perception & Visual Intelligence Research Group

18 of 20

Testing – Video 1 (From Internet)

Video Input

Anomaly Detected

Machine Perception & Visual Intelligence Research Group

19 of 20

Testing – Video 2 (From Internet)

Video Input

Anomaly Detected

Machine Perception & Visual Intelligence Research Group

20 of 20

Thank You

Project Website

https://qrgo.page.link/KMoVB

Project GitHub

https://github.com/fyp-gideon

Machine Perception & Visual Intelligence Research Group