1 of 52

Deep Learning and ML

Industry perspective

Unwrapping the computational nodes

David Cardozo

GDE Quebec

@davidcardozo

2 of 52

Field is changing rapidly

The following demo maybe not impressive

"Give me the first few sentences of the speech delivered by portia of the merchant of venice in modern spanish, feel free to modernize the speech "

Large Language Model (Translator + Retrieval) -> Speech to Text -> Image generator

3 of 52

Understanding Images

Which capabilities exceed human performance between 2015 and 2020?

4 of 52

Videos

Even new models can process videos

Embedding videos

Query
Generate

Half screen photo slide if �text is necessary

5 of 52

What was the state of the art in 1989?

https://www.youtube.com/watch?v=FwFduRA_L6Q

6 of 52

How did it start?

Hubel & Wiesel (1962)

7 of 52

An historic development

AlexNet 2012

Revival of Lenet

GPUs for 2D Convolutions

8 of 52

Let us discuss in depth this new accelerator

9 of 52

CUDA and CUDNN

Start of CUDNN

Convolution forward and backward
Pooling forward and backward
Softmax forward and backward
Neuron activations forward and backward:

Rectified linear (ReLU)
Sigmoid
Hyperbolic tangent (TANH)

Tensor transformation functions

10 of 52

Linus et Nvidia.

“Near the end of his talk, when asked by one of the attendees about NVIDIA's hardware support and lack of open-source driver enablement / documentation, he had a few choice words for the Santa Clara company.”

Link

11 of 52

Convolutional Networks

12 of 52

Computer Vision goes BRRR

13 of 52

14 of 52

Characteristics of images

14

Statistics of natural images obey invariants

…

Translation

Cutout

Dilatation

Contrast

Rotation

Scale

Brightness

…

Como les contaba, usamos esta estructura jerarquica con varias capas en las redes neuronales para seguir la intuicion del procesamiento de imagenes en mamiferos. Sin embargo, cómo conectamos estas capas? Para responder esta pregunta, primero debemos entender como representa un computador una imagen capturada con una cámara, en un computador, las imagenes se representan básicamente como matrices, donde cada campo corresponde a un pixel y cada valor al valor de intensidadl. Nuestra primera intuicón podría ser tomar cada pixel como un input y conectarlo completamente a la siguiente capa de la red, sinembargo esto puede resultar bastante ineficiente ya que una imagen en HD podría tener al rededor de un millon de pixeles, esto significaría un millón de parámetros para cada conexión en la siguiente capa de la red. Es aqui donde aprovechamos que la estadística de imágenes naturales obedece a varias invariantes, por ejemplo si tomo la foto de un gato y tralado al gato de posición, este seguirá siendo un gato, si el gato rota o es más pequeño o grande, igual. Es aquí donde aparece el uso de convoluciones, básicamente consiste en que vamos a generar un kernel de dimensión n que se va a deslizar a travpes de la imagen, haciendo que los pesos sean compartidos a través de la imágen, reduciendo drásticamente el número de parámetros

15 of 52

Sobel Filters

Deep ConvNets

15

Layer 1

16 of 52

¿Cómo funciona un kiwibot?

Nvidia’s Autopilot

16

17 of 52

End Project

Kiwibot dreaming

17

18 of 52

Statistical Methods

“Inference-only” (DL) Models

“Generalizable” (DL) Models

19 of 52

CLIP

20 of 52

Transformer

21 of 52

ChatGPT & GPT4

Reinforcement Learning from Human Feedback (RLHF)

22 of 52

Setting the background…

Deep Learning for Generation

23 of 52

Setting the background…

What are Variational Autoencoders?

Encoder

Decoder

z

Input image

Hidden representation

Output image

24 of 52

Setting the background…

What are GANs?

Generator

Discriminator

True/False

Random Noise

Generated Image

Original Image

25 of 52

Panda mad scientist mixing sparkling chemicals, artstation

What can Diffusion models do?

Dog looking in the mirror, seeing a cat

A hedgehog wearing a leather jacket, playing a guitar on a beach

26 of 52

What is Diffusion?

The physics definition:

Is it possible to reverse this?

27 of 52

What is Diffusion?

The ML definition:

Diffusion Task:

Gradually add noise to the image in T steps in the forward process and try to recover the original image from the noisy image at x_T

[q (x_t|x_t-1)]

[p_θ (x_t-1|x_t)]

28 of 52

Deep Learning Frameworks

29 of 52

Hardware

GPUs

TPUs

30 of 52

Hubs

🤗 Hugging Face

TF Datasets, TF Hub, Torch Hub, Detectron2, ...

31 of 52

How do this actually work?

32 of 52

Data

calcPE(stock){

price = readPrice();

earnings = readEarnings();

return (price/earnings);

}

Rules

(Expressed in Code)

Answers

(Returned From Code)

33 of 52

if (ball.collide(brick)){

removeBrick();

ball.dx=-1*(ball.dx);

ball.dy=-1*(ball.dy);

}

34 of 52

Rules

Data

Traditional Programming

Answers

35 of 52

Rules

Data

Traditional Programming

Answers

Data

Rules

Machine Learning

36 of 52

Activity Recognition

if(speed<4){

status=WALKING;

}

37 of 52

Activity Recognition

if(speed<4){

status=WALKING;

}

if(speed<4){

status=WALKING;

} else {

status=RUNNING;

}

38 of 52

Activity Recognition

if(speed<4){

status=WALKING;

}

if(speed<4){

status=WALKING;

} else {

status=RUNNING;

}

if(speed<4){

status=WALKING;

} else if(speed<12){

status=RUNNING;

} else {

status=BIKING;

}

39 of 52

Activity Recognition

if(speed<4){

status=WALKING;

}

if(speed<4){

status=WALKING;

} else {

status=RUNNING;

}

if(speed<4){

status=WALKING;

} else if(speed<12){

status=RUNNING;

} else {

status=BIKING;

}

// ????

40 of 52

Rules

Data

Traditional Programming

Answers

Data

Rules

Machine Learning

41 of 52

Activity Recognition

0101001010100101010100101010100101110101001010100101010010101001010100101010

Label = WALKING

1010100101001010101010101001001001000100100111110101011111010100100111101011

Label = RUNNING

1001010011111010101110101011101010111010101011110101010111111110001111010101

Label = BIKING

1111111111010011101001111101011111010101011101010101011101010101010100111110

Label = GOLFING (Sort of)

42 of 52

Activity Recognition

0101001010100101010100101010100101110101001010100101010010101001010100101010

Label = WALKING

1010100101001010101010101001001001000100100111110101011111010100100111101011

Label = RUNNING

1001010011111010101110101011101010111010101011110101010111111110001111010101

Label = BIKING

1111111111010011101001111101011111010101011101010101011101010101010100111110

Label = GOLFING (Sort of)

43 of 52

Activity Recognition

0101001010100101010100101010100101110101001010100101010010101001010100101010

Label = WALKING

1010100101001010101010101001001001000100100111110101011111010100100111101011

Label = RUNNING

1001010011111010101110101011101010111010101011110101010111111110001111010101

Label = BIKING

1111111111010011101001111101011111010101011101010101011101010101010100111110

Label = GOLFING (Sort of)

44 of 52

The Machine Learning Paradigm

Make a Guess!

45 of 52

The Machine Learning Paradigm

Make a Guess!

Measure your accuracy

46 of 52

The Machine Learning Paradigm

Make a Guess!

Measure your accuracy

Optimize your Guess

47 of 52

The Machine Learning Paradigm

Make a Guess!

Measure your accuracy

Optimize your Guess

Repeat

48 of 52

The Machine Learning Paradigm

Machine Learning

Labels

Data

Rules

49 of 52

The Machine Learning Paradigm

Machine Learning

Answers

Data

Rules

Model

Data

Inferences

50 of 52

Jax Ecosystem

51 of 52

Jax Ecosystem

https://github.com/n2cholas/awesome-jax

52 of 52

Jax Neural Network Libraries

https://github.com/n2cholas/awesome-jax