JavaScript isn't enabled in your browser, so this file can't be opened. Enable and reload.

1 of 18

How to use TensorRT C++ API for High Performance GPU Inference

A VCV Presentation by Cyrus Behroozi

Code: https://github.com/cyrusbehr/tensorrt-cpp-api

2 of 18

Hi, I’m Cyrus!

Senior software developer at Trueface
SDK Team Lead
Skilled in packaging & deploying machine learning inference solutions

3 of 18

What we will go over today

Installing TensorRT on Ubuntu 20.04
Generating a TRT engine file optimized for your GPU
Specifying a simple optimization profile
Reading / writing data from / into GPU memory
Synchronous inference
Models with dynamic batch sizes

Switching between multiple optimization profiles at runtime
Asynchronous inference
Cuda streams

What we will not go over today

4 of 18

Why use TensorRT?

Best inference framework for NVIDIA GPUs

Speed
Memory

5 of 18

Motivation for making this presentation

TensorRT docs are not user friendly

18 lines of code

5 compiler errors

6 of 18

Installing TensorRT

Download TensorRT 8.x tarball

https://developer.nvidia.com/nvidia-tensorrt-8x-download

7 of 18

Set up the CMakeLists.txt to link TensorRT

Note: the above is not a complete CMakeLists.txt file

8 of 18

TensorRT workflow

9 of 18

Implementation overview

10 of 18

Avoid regenerating the engine file when not necessary

11 of 18

Build phase

12 of 18

Build phase 2

13 of 18

Loading the engine from disk

14 of 18

Running inference

15 of 18

NHWC to NCHW conversion

NHWC: For each pixel, the three values are stored together in RGB order.
NCHW: All the R channel values are stored first, then the G, then B.
OpenCV is NHWC, TensorRT expects NCHW

16 of 18

NHWC to NCHW conversion cont.

17 of 18

Running inference cont.

18 of 18

Fin.

Any questions?

Code: https://github.com/cyrusbehr/tensorrt-cpp-api