1 of 18

How to use TensorRT C++ API for High Performance GPU Inference

A VCV Presentation by Cyrus Behroozi

Code: https://github.com/cyrusbehr/tensorrt-cpp-api

2 of 18

Hi, I’m Cyrus!

  • Senior software developer at Trueface
  • SDK Team Lead
  • Skilled in packaging & deploying machine learning inference solutions

3 of 18

What we will go over today

  • Installing TensorRT on Ubuntu 20.04
  • Generating a TRT engine file optimized for your GPU
  • Specifying a simple optimization profile
  • Reading / writing data from / into GPU memory
  • Synchronous inference
  • Models with dynamic batch sizes

  • Switching between multiple optimization profiles at runtime
  • Asynchronous inference
  • Cuda streams

What we will not go over today

4 of 18

Why use TensorRT?

  • Best inference framework for NVIDIA GPUs
    • Speed
    • Memory

5 of 18

Motivation for making this presentation

TensorRT docs are not user friendly

18 lines of code

5 compiler errors

6 of 18

Installing TensorRT

  • Download TensorRT 8.x tarball

7 of 18

Set up the CMakeLists.txt to link TensorRT

Note: the above is not a complete CMakeLists.txt file

8 of 18

TensorRT workflow

9 of 18

Implementation overview

10 of 18

Avoid regenerating the engine file when not necessary

11 of 18

Build phase

12 of 18

Build phase 2

13 of 18

Loading the engine from disk

14 of 18

Running inference

15 of 18

NHWC to NCHW conversion

  • NHWC: For each pixel, the three values are stored together in RGB order.
  • NCHW: All the R channel values are stored first, then the G, then B.
  • OpenCV is NHWC, TensorRT expects NCHW

16 of 18

NHWC to NCHW conversion cont.

17 of 18

Running inference cont.

18 of 18

Fin.

Any questions?

Code: https://github.com/cyrusbehr/tensorrt-cpp-api