Deep Learning in a Nutshell
Md Sohel Rana
(Graduate Researcher)
La Trobe University
1
How Do We Learn?
Slide from Zhen He
2
How Do We Learn?
Slide from Zhen He
3
How Do We Learn?
Slide from Zhen He
4
How Do We Learn?
Slide from Zhen He
5
Deep Learning versus Traditional Machine Learning
Slide from Zhen He
6
Image representation
7
Hand Engineered Features
Slide from Zhen He
8
Image Recognition: Hand Crafted Features
[Andrew Ng, UCLA deep learning summer school 2012]
9
Traditional Machine Learning
Input
Learning
algorithm
Feature Representation
Machine Learning
Slide from Zhen He
10
Traditional Machine Learning
Machine Learning Algorithm
prob(car) prob(motorbike)
Hand engineered features
Slide from Zhen He
11
Feature Learning
Input
Learning
algorithm
Feature Representation
Machine Learning
Slide from Zhen He
12
Feature Learning
prob(car) prob(motorbike)
X1 (number of lights)
X2 (number of handlebars)
Decision boundary
Cars
Motor bikes
13
Playing Lego
2nd block turns vector into text.
Encode Image
CNN
NN
Vector
Decode to text
Lego blocks connect via vectors
Small bird with orange chest
Slide from Zhen He
14
Deep Learning Lego
Encode English
Vector
Decode to Spanish
Spanish Sentence
English Sentence
Encode English
Vector
Decode to German
German Sentence
English Sentence
Slide from Zhen He
15
Many many Lego pieces
16
Who is Using Deep Learning?
17
Who is using Deep Learning?
Slide from Zhen He
18
Language Translation
Reduces translation errors by an average of 60% compared to Google's phrase-based production system.
Slide from Zhen He
19
Who is using Deep Learning?
Deep recurrent neural networks reduced word errors by more than 30%
Slide from Zhen He
20
Who is using Deep Learning?
21
Who is using Deep Learning?
22
Who is using Deep Learning?
23
Just getting started
24
Introduction to Neural Networks
25
Neural network
a(x)
h(x)
26
Activation Function
Usually drawn
this way
27
28
XOR solution
| | y |
0 | 0 | 0 |
0 | 1 | 1 |
1 | 0 | 1 |
1 | 1 | 0 |
4
Examples
W21
W11
W22
W12
Wh1
Wh2
29
Deep Learning
http://playground.tensorflow.org/
…
…
…
…
Inputs
Outputs
W1
W2
WL
…
Slide from Zhen He
30
Forward Pass
…
Slide from Zhen He
31
Loss Function
Compare output to ground truth labels y1 … yK
…
32
Back Propagation
…
Adjust
33
Optimization: Gradient Descent
W
Loss(W)
learning rate
Concave optimization problem
34
The optimal learning rate
Very small learning rate
learning rate
Very large learning rate
(converges too slowly)
(jumps out of the minima)
35
Popular Optimization Method
Learning rate goes down
36
Popular Optimization Methods
37
Tuning Initial Learning Rate
38
Three Types of Gradient Descent
39
Back Propagation
red numbers are gradients
40
Gradient At Branches
p
q
c
Loss
. . . . .
. . . . .
41
The Softmax Layer and the Cross Entropy Loss Function
42
How to Remember Cross Entropy Loss
43
Consider the Following Classification Problem
X
W
Neural network
ab ag ap
softmax
44
Consider the Following Classification Problem
X
W
Neural network
ab ag ap
softmax
| | |
0.2 | 0.1 | 0.7 |
logits
probabilities
45
What if I don’t care about the maths and just want to use it
Neural network layers
softmax
8
5.0
2.0
1.0
0.94
0.04
0.02
Cross entropy loss
0.067
1 0 0
One hot encoded target
Pytorch code:
import torch
criterion=nn.CrossEntropyLoss() //Softmax+cross entropy. Don’t need one-hot encoding
logits
46
Regression Loss Function
47
Activation Functions
48
Sigmoid Function
49
Sigmoid Function
Saturated neurons kill the gradients
50
Rectified Linear Unit (ReLU)
51
Convolutional Neural Networks
52
Convolutional Neural Networks
Output feature map
Filter
2 | | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
Output feature map
Filter
| 5 | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | 4 | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
Filter
Output feature map
53
What is the result of training?
Output feature map
Filter
Kernel
Weight
2 | | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
Output feature map
| 5 | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | 4 | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
Output feature map
Filter
Kernel
Weight
Filter
Kernel
Weight
54
Spatial Invariance
Training Set
Testing Set
55
Multiple layers
Output feature map layer 2
Filter
2 | | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
3 | | | | |
| | | | |
| | | | |
| | | | |
Filter
Output feature map layer 1
(input for the next layer)
56
How to Compute a Convolution?
1.2 | 2.2 | 1.2 | 3.1 |
0.2 | 2.1 | 2.5 | -1.2 |
-1.1 | 2.3 | 1.1 | -0.2 |
1.2 | 2.1 | -2.3 | -1.1 |
0.2 | 1.2 |
3.1 | 1.1 |
Filter
Input Feature Map
5.81 | | |
| | |
| | |
Output Feature Map
0.2 * 1.2 + 1.2 * 2.2 +
3.1 * 0.2 + 1.1 * 2.1
57
How to Compute a Convolution?
1.2 | 2.2 | 1.2 | 3.1 |
0.2 | 2.1 | 2.5 | -1.2 |
-1.1 | 2.3 | 1.1 | -0.2 |
1.2 | 2.1 | -2.3 | -1.1 |
0.2 | 1.2 |
3.1 | 1.1 |
Filter
Input Feature Map
5.81 | 11.14 | |
| | |
| | |
Output Feature Map
0.2 * 2.2 + 1.2 * 1.2 +
3.1 * 2.1 + 1.1 * 2.5
58
Padding with Zeros
0 | 0 | 0 | 0 | 0 | 0 |
0 | 1.2 | 2.2 | 1.2 | 3.1 | 0 |
0 | 0.2 | 2.1 | 2.5 | -1.2 | 0 |
0 | -1.1 | 2.3 | 1.1 | -0.2 | 0 |
0 | 1.2 | 2.1 | -2.3 | -1.1 | 0 |
0 | 0 | 0 | 0 | 0 | 0 |
Input Feature Map
2.1 | 1.2 | 0.5 |
0.7 | 8.1 | 1.20 |
5.2 | 3.2 | 1.1 |
| | | |
| | | |
| | | |
| | | |
Filter
Output Feature Map
59
Convolutions generates and output feature map
https://community.arm.com/graphics/b/blog/posts/when-parallelism-gets-tricky-accelerating-floyd-steinberg-on-the-mali-gpu
Output feature map
60
The depth represents three feature maps / channels concatenated together.
Usually in the first layer these three channels represent the three colour
Channels R G B.
61
62
63
64
65
Layer 1 and 2 of CNN
source: https://adeshpande3.github.io/adeshpande3.github.io/The-9-Deep-Learning-Papers-You-Need-To-Know-About.html
66
Layer 3 of CNN
source: https://adeshpande3.github.io/adeshpande3.github.io/The-9-Deep-Learning-Papers-You-Need-To-Know-About.html
67
Layer 4 and 5 of CNN
source: https://adeshpande3.github.io/adeshpande3.github.io/The-9-Deep-Learning-Papers-You-Need-To-Know-About.html
68
Convolution with Larger Stride
1.2 | 2.2 | 1.2 | 3.1 |
0.2 | 2.1 | 2.5 | -1.2 |
-1.1 | 2.3 | 1.1 | -0.2 |
1.2 | 2.1 | -2.3 | -1.1 |
0.2 | 1.2 |
3.1 | 1.1 |
Filter
Input Feature Map
5.81 | |
| |
Output Feature Map
0.2 * 1.2 + 1.2 * 2.2 +
3.1 * 0.2 + 1.1 * 2.1
69
Convolution with Larger Stride
1.2 | 2.2 | 1.2 | 3.1 |
0.2 | 2.1 | 2.5 | -1.2 |
-1.1 | 2.3 | 1.1 | -0.2 |
1.2 | 2.1 | -2.3 | -1.1 |
0.2 | 1.2 |
3.1 | 1.1 |
Filter
Input Feature Map
5.81 | 10.39 |
| |
Output Feature Map
0.2 * 1.2 + 1.2 * 3.1 +
3.1 * 2.5 + 1.1 * -1.2
70
Pooling / Subsampling
1.2 | 2.2 | 1.2 | 3.1 |
0.2 | 2.1 | 2.5 | -1.2 |
-1.1 | 2.3 | 1.1 | -0.2 |
1.2 | 2.1 | -2.3 | -1.1 |
Input Feature Map
2.2 | |
| |
Output Feature Map
71
Pooling / Subsampling
1.2 | 2.2 | 1.2 | 3.1 |
0.2 | 2.1 | 2.5 | -1.2 |
-1.1 | 2.3 | 1.1 | -0.2 |
1.2 | 2.1 | -2.3 | -1.1 |
Input Feature Map
2.2 | 3.1 |
| |
Output Feature Map
72
A Typical Convolutional Network
73
Where to use Fully Connected Layers (Linear Layers)?
Convolutional layers
softmax
8
5.0
2.0
1.0
0.94
0.04
0.02
Linear layer
Prob grapes
Prob pear
Prob banana
74
Fully Connect Network (Linear Layer)
Output Neurons
Input Neurons
wij
A weight for each connection
75
Convolutions layers much more efficient than fully connected layers
It is common for 99% of weights
for fully connected layers
It is common for 1% of weights
for convolutional layers
76
Convolutional Layers Have Small Number of Weights
150,528 x 295,704 weights = 44, 511, 731,712 weights!!!
224
224
3
3 x 3 x3 x 6 filter
222
222
77
Three Famous Convolutional Architectures
78
AlexNet
79
AlexNet
80
VGG
81
VGG
82
VGG16 architecture
83
VGG
84
ResNet!
85
Deep Residual Learning for Image Recognition�
86
Wouldn’t it be cool!
Input A
Input B
87
One Simple Awesome Trick! (deep residual network
88
Generalization
89
Data Augmentation
90
Dropout Method
Hinton, G. E., Srivastava, N., Krizhevsky, A., Sutskever, I. and Salakhutdinov, R. R. (2012), Improving neural networks by preventing co-adaptation of feature detectors
91
Amount of Dropout
Dropout code in pytorch: nn.Dropout(0.5)
92
Used Dropout to Avoid Overfitting
93
Normalization is Important
Features include: number of bedrooms, land size, etc.
Features are of very different scales.
Ioffe, Sergey, and Christian Szegedy. "Batch normalization: Accelerating deep network training by reducing internal covariate shift." arXiv preprint arXiv:1502.03167 (2015).
94
batch size
Number of dimensions
95
96
Results
97
Batch Normalization Popularity
Pytorch code:
torch.nn.BatchNorm1d(shape_of_output_feature)
98
Both Theano and TensorFlow uses Idea of Computational Graphs
99
PyTorch
100
Deep Dive into Pytorch
101
What is a Tensor?
v = [1.1, 2.2, 3.3]
v = [1.1, 2.2, 3.3]
m = [[1, 2, 3], [4, 5, 6], [7, 8, 9]]
102
Some tensor processing
Permute:
in=torch.tensor([[4,5,8],[1,2,0]])
out=in.permute(1,0)
print(out)
Output: tensor([[4, 1],
[5, 2],
[8, 0]])
Squeezing and unsqueezing:
in=torch.randn(4,5)
out=in.unsqueeze(dim=0)
print(out.shape)
Output: torch.Size([1, 4, 5])
out=out.squeeze(dim=0)
print(out.shape)
Out: torch.Size([ 4, 5])
Stacking tensor list:
in=[]
in.append(torch.tensor([4,5]))
in.append(torch.tensor([2,1]))
in.append(torch.tensor([0,3]))
out = torch.stack(in)
Out: tensor([[4, 5],
[2, 1],
[0, 3]])
Reshaping:
in=torch.randn(4,5)
out = in.view(-1,10)
print(out.shape)
Output: torch.Size([2, 10])
103
How do we train in Pytorch?
Dataset
Dataloader
Model
Train
104
Writing code in pytorch
105
Custom dataloader structure
Return single batch example with label
Load all annotations of examples and labels
106
Library and configuration
import torch
import os
from PIL import Image
from torch.utils.data.dataset import Dataset
import numpy as np
import csv
from torch.utils.data import DataLoader
from torchvision import transforms
import torch.nn as nn
import torch.optim as optim
import torch.nn.functional as F
TRAIN_IMG_FILE_ANNOTATION = './dataset/train.txt'
TRAIN_IMG_DIR = './dataset/train_img'
TEST_IMG_FILE_ANNOTATION = './dataset/test.txt'
TEST_IMG_DIR = './dataset/test_img'
NLABELS = 5
batch_size = 32
num_epoch=4
107
Custom dataloader part
class DataPreparation(Dataset):
def __init__(self, annotation_file,img_root_path, datatypes):
self.img_root_path = img_root_path
self.images=[]
self.labels=[]
if datatypes=="train":
self.transforms = transforms.Compose([transforms.Resize((64,64)),transforms.RandomRotation((-4,4)),transforms.ToTensor()])
if datatypes=="val":
self.transforms = transforms.Compose([transforms.Resize((64,64)),transforms.ToTensor()])
with open(annotation_file, 'r') as annotation_reader:
annotation = csv.reader(annotation_reader, delimiter=',')
for row in annotation:
self.images.append(row[0])
self.labels.append(int(row[1]))
def __getitem__(self, index):
img = Image.open(os.path.join(self.img_root_path, self.images[index]))
img = img.convert('RGB')
img = self.transforms(img)
labels=self.labels[index]
return {"img":img,"labels":labels}
def __len__(self):
return len(self.images)
train_dataset=DataPreparation(TRAIN_IMG_FILE_ANNOTATION,TRAIN_IMG_DIR,datatypes="train")
train_size=len(train_dataset)
train_loader = DataLoader(train_dataset,batch_size=batch_size,shuffle=True,num_workers=0)
test_dataset=DataPreparation(TEST_IMG_FILE_ANNOTATION,TEST_IMG_DIR,datatypes="val")
test_size=len(test_dataset)
test_loader = DataLoader(test_dataset,batch_size=batch_size,shuffle=True,num_workers=0)
dataloader={"train":train_loader,"val":test_loader}
datasize={"train":train_size,"val":test_size}
108
Model part
class MultiLabelNN(nn.Module):
def __init__(self, nlabel):
super(MultiLabelNN, self).__init__()
self.nlabel = nlabel
self.conv1 = nn.Conv2d(3, 6, 5)
self.pool = nn.MaxPool2d(2, 2)
self.conv2 = nn.Conv2d(6, 16, 5)
self.fc1 = nn.Linear(10816,512)
self.fc2 = nn.Linear(512, nlabel)
def forward(self, x):
x = self.conv1(x)
x = F.relu(x)
x = self.pool(x)
x = self.conv2(x)
x = F.relu(x)
x = x.view(-1, 10816)
x = self.fc1(x)
x = F.relu(x)
x = self.fc2(x)
return x
109
Training part
model = MultiLabelNN(NLABELS)
optimizer = optim.Adam(model.parameters(), lr=0.001) # optimizer
criterion = nn.CrossEntropyLoss() #Loss function
for epoch in range(num_epoch):
print("Epoch: {0}/{1}".format(epoch+1,num_epoch))
for phase in ["train","val"]:
if phase=="train":
model.train()
if phase=="val":
model.eval()
running_loss = 0.0
running_correct = 0.0
for index,data in enumerate(dataloader[phase]):
images=data["img"]
labels=data["labels"]
optimizer.zero_grad()
outputs = model.forward(images)
loss=criterion(outputs,labels)
_, predicted = torch.max(outputs.data, 1)
if phase=="train":
loss.backward()
optimizer.step()
running_loss += loss.item()*labels.shape[0]
running_correct += (predicted == labels).sum().item()
if phase=="train":
train_loss=running_loss/datasize["train"]
train_accuracy=running_correct/datasize["train"]
print("Training loss: {0}".format(train_loss))
print("Training accuracy: {0}".format(train_accuracy))
if phase=="val":
test_loss=running_loss/datasize["train"]
test_accuracy=running_correct/datasize["val"]
print("Validation loss: {0}".format(test_loss))
print("Validation accuracy: {0}".format(test_accuracy))
110
Use GPU in pytorch
Code:
model=model.cuda()
……….
……….
images=data["img"].cuda()
labels=data["labels"].cuda()
111
Tensorboard code
from tensorboardX import SummaryWriter ## Add Library
train_writer = SummaryWriter("./logs/train") ## Write this two writer on top.
val_writer = SummaryWriter("./logs/val")
if phase=="train":
train_loss=running_loss/datasize[phase]
train_accuracy=running_correct/datasize[phase]
train_writer.add_scalar("loss",train_loss,epoch) ## plot train loss by train_writer
train_writer.add_scalar("accuracy",train_accuracy,epoch) ## plot train accuracy by train_writer
print("Training loss: {0}".format(train_loss))
print("Training accuracy: {0}".format(train_accuracy))
if phase=="val":
validation_loss=running_loss/datasize[phase] ## plot validation loss by val_writer
validation_accuracy=running_correct/datasize[phase] ## plot validation accuracy by val_writer
val_writer.add_scalar("loss",validation_loss,epoch)
val_writer.add_scalar("accuracy",validation_accuracy,epoch)
112
Tensorboard command
tensorboard --logdir="./logs" --port 6006
113
Tensorboard output & Early Stopping
Stop
114
Understand how a model is performing
Overfitting
Underfitting
Good fitting
115
Small data set?
116
Transfer learning
Slide source: https://www.slideshare.net/xavigiro/deep-learning-for-computer-vision-transfer-learning-and-domain-adaptation-upc-2016
117
Slide source: https://www.slideshare.net/xavigiro/deep-learning-for-computer-vision-transfer-learning-and-domain-adaptation-upc-2016
118
Transfer Learning Training
119
Transfer learning code
class TModel(nn.Module): ## Model
def __init__(self, nlabel):
super(TModel, self).__init__()
self.resnet = models.resnet18(pretrained=True)
self.sliced_resnet = torch.nn.Sequential(*(list(self.resnet.children())[:-1]))
self.fc = nn.Linear(512,nlabel)
def forward(self, x):
x = self.sliced_resnet(x)
x=x.view(-1,x.shape[1])
x = self.fc(x)
return x
for param in model.resnet.parameters():
param.requires_grad=False ## Fridge part of a model
120
How to combine information
121
Gating (one side acts like a switch)
Image
Attribute (e.g. age)
Linear layer to make the dimensionality the same
Gating function
- Element-wise multiply operation
Sigmoid layer (acts like a switch)
- values are between 0 and 1.
This will have a large influence due to it being a switch
1.0 | 0.2 | 1.0 | 0.8 | 0.1 | 0.3 | 0.0 |
0.2 | 2.1 | 4.1 | 2.2 | 5.5 | 6.1 | 1.2 |
*
*
*
*
*
*
*
switch
. . . . . . . . ..
. . . . . . . . ..
122
Addition
Image
Attribute (e.g. age)
Linear layer to make the dimensionality the same
. . . . . . . . ..
Elementwise add
. . . . . . . . ..
123
Concatenation�
Image
Attribute (e.g. age)
Concatenate
. . . . . . . . ..
. . . . . . . . ..
124
Types of concatenation
125
Gating, addition and concatenation code
class TModel(nn.Module): ## Model
def __init__(self, nlabel):
super(TModel, self).__init__()
self.resnet = models.resnet18(pretrained=True)
self.sliced_resnet = torch.nn.Sequential(*(list(self.resnet.children())[:-1]))
self.fc = nn.Sequential(nn.Linear(1,512),nn.ReLU(inplace=True))
def forward(self, image,age):
f1 = self.sliced_resnet(image)
f1=f1.view(-1,f1.shape[1]) ##Output
f2 = self.fc(age)
out=f1*f2 ##Gating of feature f1 and f2.
return out
Adding: out=f1+f2,
Concatenation: out=torch.cat((f1,f2),dim=0)
126
Reduced Dimensionality
(bottleneck layer)
(Convolutional Layers)
(UpConv Layers)
127
Autoencoders – Why use
Compressed latent space
Trained weights
Trained weights
Semantic segmentation
128
Autoencoder-Configuration and dataloader
import torch
import torchvision
from torch import nn
from torch.autograd import Variable
from torch.utils.data import DataLoader
from torchvision import transforms
from torchvision.utils import save_image
from torchvision.datasets import MNIST
num_epoch = 100
batch_size = 128
img_transform = transforms.Compose([transforms.ToTensor(),transforms.Normalize((0.5,), (0.5,))])
train_dataset = MNIST('./data', transform=img_transform,train=True,download=True)
train_size=len(train_dataset)
train_loader = DataLoader(train_dataset,batch_size=batch_size,shuffle=True,num_workers=0)
test_dataset = MNIST('./data', transform=img_transform,train=False,download=True)
test_size=len(test_dataset)
test_loader = DataLoader(test_dataset,batch_size=batch_size,shuffle=True,num_workers=0)
dataloader={"train":train_loader,"val":test_loader}
datasize={"train":train_size,"val":test_size}
model=autoencoder()
criterion = nn.MSELoss()
optimizer = torch.optim.Adam(model.parameters(), lr=0.001,weight_decay=1e-5)
def to_img(x):
x = 0.5 * (x + 1)
x = x.clamp(0, 1)
x = x.view(x.size(0), 1, 28, 28)
return x
129
Autoencoder – Model
class autoencoder(nn.Module):
def __init__(self):
super(autoencoder, self).__init__()
self.encoder = nn.Sequential(
nn.Conv2d(1, 16, 3, stride=3, padding=1),
nn.ReLU(True),
nn.MaxPool2d(2, stride=2),
nn.Conv2d(16, 8, 3, stride=2, padding=1),
nn.ReLU(True),
nn.MaxPool2d(2, stride=1)
)
self.decoder = nn.Sequential(
nn.ConvTranspose2d(8, 16, 3, stride=2),
nn.ReLU(True),
nn.ConvTranspose2d(16, 8, 5, stride=3, padding=1),
nn.ReLU(True),
nn.ConvTranspose2d(8, 1, 2, stride=2, padding=1)
nn.Tanh()
)
def forward(self, x):
x = self.encoder(x)
x = self.decoder(x)
return x
130
Autoencoder-Training
for epoch in range(num_epoch):
print("Epoch: {0}/{1}".format(epoch+1,num_epoch))
for phase in ["train","val"]:
if phase=="train":
model.train()
if phase=="val":
model.eval()
running_loss = 0.0
for index,data in enumerate(dataloader[phase]):
img,_=data
optimizer.zero_grad()
output = model.forward(img)
loss = criterion(output,img)
if phase=="train":
loss.backward()
optimizer.step()
running_loss += loss.item()*img.shape[0]
if phase=="train":
train_loss=running_loss/datasize[phase]
print("Training loss: {0}".format(train_loss))
if phase=="val":
validation_loss=running_loss/datasize[phase]
print("Validation loss: {0}".format(validation_loss))
pic = to_img(output.data)
save_image(pic, './dc_img/image_{}.png'.format(epoch))
torch.save(model.state_dict(), './conv_autoencoder.pth')
131
Why use GANs
Style transfer
MuseGAN – Music generation
Generate new face
Image inpainting
Deep fake videos
132
How do GANs work?
| Footer text
Page 133
source: https://www.slideshare.net/xavigiro/deep-learning-for-computer-vision-generative-models-and-adversarial-training-upc-2016
133
| Footer text
Page 134
source: https://www.slideshare.net/xavigiro/deep-learning-for-computer-vision-generative-models-and-adversarial-training-upc-2016
134
| Footer text
Page 135
source: https://www.slideshare.net/xavigiro/deep-learning-for-computer-vision-generative-models-and-adversarial-training-upc-2016
135
| Footer text
Page 136
source: https://www.slideshare.net/xavigiro/deep-learning-for-computer-vision-generative-models-and-adversarial-training-upc-2016
136
| Footer text
Page 137
source: https://www.slideshare.net/xavigiro/deep-learning-for-computer-vision-generative-models-and-adversarial-training-upc-2016
137
| Footer text
Page 138
source: https://www.slideshare.net/xavigiro/deep-learning-for-computer-vision-generative-models-and-adversarial-training-upc-2016
Change generator weights to make the generated examples look more real.
138
GAN- configuration and dataloader
import torch
import torch.nn as nn
from torch.autograd import Variable
from torch.utils.data import DataLoader
from torchvision import transforms
from torchvision import datasets
from torchvision.utils import save_image
def to_img(x):
out = 0.5 * (x + 1)
out = out.clamp(0, 1)
out = out.view(-1, 1, 28, 28)
return out
batch_size = 128
num_epoch = 100
z_dimension = 100 # noise dimension
img_transform = transforms.Compose([transforms.ToTensor(),transforms.Normalize((0.5, 0.5, 0.5), (0.5, 0.5, 0.5))])
mnist = datasets.MNIST('./data', transform=img_transform)
dataloader = DataLoader(mnist, batch_size=batch_size, shuffle=True,num_workers=0)
139
GAN-model
class discriminator(nn.Module):
def __init__(self):
super(discriminator, self).__init__()
self.conv1 = nn.Sequential(
nn.Conv2d(1, 32, 5, padding=2), # batch, 32, 28, 28
nn.LeakyReLU(0.2, True),
nn.AvgPool2d(2, stride=2), # batch, 32, 14, 14
)
self.conv2 = nn.Sequential(
nn.Conv2d(32, 64, 5, padding=2), # batch, 64, 14, 14
nn.LeakyReLU(0.2, True),
nn.AvgPool2d(2, stride=2) # batch, 64, 7, 7
)
self.fc = nn.Sequential(
nn.Linear(64*7*7, 1024),
nn.LeakyReLU(0.2, True),
nn.Linear(1024, 1),
nn.Sigmoid()
)
def forward(self, x):
'''
x: batch, width, height, channel=1
'''
x = self.conv1(x)
x = self.conv2(x)
x = x.view(x.size(0), -1)
x = self.fc(x)
return x
class generator(nn.Module):
def __init__(self, input_size, num_feature):
super(generator, self).__init__()
self.fc = nn.Linear(input_size, num_feature) # batch, 3136=1x56x56
self.br = nn.Sequential(
nn.BatchNorm2d(1),
nn.ReLU(True)
)
self.downsample1 = nn.Sequential(
nn.Conv2d(1, 50, 3, stride=1, padding=1), # batch, 50, 56, 56
nn.BatchNorm2d(50),
nn.ReLU(True)
)
self.downsample2 = nn.Sequential(
nn.Conv2d(50, 25, 3, stride=1, padding=1), # batch, 25, 56, 56
nn.BatchNorm2d(25),
nn.ReLU(True)
)
self.downsample3 = nn.Sequential(
nn.Conv2d(25, 1, 2, stride=2), # batch, 1, 28, 28
nn.Tanh()
)
def forward(self, x):
x = self.fc(x)
x = x.view(x.size(0), 1, 56, 56)
x = self.br(x)
x = self.downsample1(x)
x = self.downsample2(x)
x = self.downsample3(x)
return x
140
GAN - Training
D = discriminator() # discriminator model
G = generator(z_dimension, 3136) # generator model
criterion = nn.BCELoss() # binary cross entropy
d_optimizer = torch.optim.Adam(D.parameters(), lr=0.0003)
g_optimizer = torch.optim.Adam(G.parameters(), lr=0.0003)
# train
for epoch in range(num_epoch):
for i, (img, _) in enumerate(dataloader):
num_img = img.size(0)
# =================train discriminator
real_img = img
real_label = torch.ones(num_img)
fake_label = torch.zeros(num_img)
# compute loss of real_img
real_out = D.forward(real_img)
d_loss_real = criterion(real_out, real_label)
real_scores = real_out # closer to 1 means better
# compute loss of fake_img
z = torch.randn(num_img, z_dimension)
fake_img = G.forward(z)
fake_out = D.forward(fake_img)
d_loss_fake = criterion(fake_out, fake_label)
fake_scores = fake_out # closer to 0 means better
# bp and optimize
d_loss = d_loss_real + d_loss_fake
d_optimizer.zero_grad()
d_loss.backward()
d_optimizer.step()
# ===============train generator
# compute loss of fake_img
z = Variable(torch.randn(num_img, z_dimension))
fake_img = G(z)
output = D(fake_img)
g_loss = criterion(output, real_label)
# bp and optimize
g_optimizer.zero_grad()
g_loss.backward()
g_optimizer.step()
if (i+1) % 100 == 0:
print('Epoch [{}/{}], d_loss: {:.6f}, g_loss: {:.6f} '
'D real: {:.6f}, D fake: {:.6f}'
.format(epoch, num_epoch, d_loss.data[0], g_loss.data[0],
real_scores.data.mean(), fake_scores.data.mean()))
if epoch == 0:
real_images = to_img(real_img.cpu().data)
save_image(real_images, './dc_img/real_images.png')
fake_images = to_img(fake_img.cpu().data)
save_image(fake_images, './dc_img/fake_images-{}.png'.format(epoch+1))
torch.save(G.state_dict(), './generator.pth')
torch.save(D.state_dict(), './discriminator.pth')
141
Understanding GAN loss
142
Object Detection Algorithms
Object Detector
Region proposal based
Faster R-CNN
(2015)
Mask R-CNN
(2017)
FPN (2017)
Regression based
YOLO (2016)
SSD (2016)
AttentionNet (2015)
143
Intersection over union (IOU)
144
Faster RCNN Detector
145
Faster RCNN Detector
146
Long Short Term Memory (LSTM)
147
Recurrent Neural Network Configurations
Like what regular neural networks can do
148
We will start with many to many
149
Many to Many (Generative Model)
h
?
h e
e ?
h
e
e l l o
h e l l
. . . . . . . .
150
Many to Many (Generative Model)
The cat sat down
cat sat down .
151
Recurrent Neural Network Configurations
152
One to Many (Image Caption Generation)
153
Recurrent Neural Network Configurations
154
Many to One –example 1
The best acting ever
Positive
155
Many to One – example 2
Playing Basketball
CNN
CNN
CNN
CNN
156
Many to one- example 3
157
Recurrent Neural Network Configurations
158
Sequence to Sequence Transformations
159
Many to Many (Sequence to Sequence Learning)
Encoder
Decoder
Thought Vector
160
Problem with this model
Encoder
Decoder
Thought Vector
161
The solution is to use an attention model
162
163
END
164
Limitation of deep learning
165
Deep Learning Team
166
Deep Learning Team
167
Deep Learning Journal Club
168
Question
169