#Tensorflow @martin_gorner
deep�Science !
deep� Code ...
>TensorFlow, deep learning and \
recurrent neural networks
without a PhD_
>TensorFlow, deep learning and \
recurrent neural networks
without a PhD_
@martin_gorner
O’REILLY TensorfFlow World
The superpower: batch normalisation
@martin_gorner
O’REILLY TensorfFlow World
Data “whitening”
Data: large values, different scales, skewed, correlated
@martin_gorner
O’REILLY TensorfFlow World
Data “whitening”
Modified data: centered around zero, rescaled...
Subtract average
Divide by std dev
@martin_gorner
O’REILLY TensorfFlow World
Data “whitening”
(A+B)/2
A-B
Modified data: … and decorrelated (that was almost a Principal Component Analysis)
@martin_gorner
O’REILLY TensorfFlow World
Data “whitening”
new�A
new�B
=
A
B
x
0.05
0.12
0.61
-1.23
+
-1.45
0.12
W ?
B ?
A network layer can do this !
Scale & rotate shift
@martin_gorner
O’REILLY TensorfFlow World
Fully connected network
9
...
0
1
2
softmax
200
100
60
10
30
784
OK
OK ?
OK ???
OK ???
OK ???
@martin_gorner
O’REILLY TensorfFlow World
Without batch normalisation
sigmoid
My distribution of inputs
boo-hoo
@martin_gorner
O’REILLY TensorfFlow World
Batch normalisation
Center and re-scale logits
before the activation function
(decorrelate ? no, too complex)
Compute average and variance on mini-batch
Add learnable scale and offset
for each logit so as to restore expressiveness
“logit” = weighted sum + bias
one of each per neuron
Try α=stdev(x) and β=avg(x) and you have BN(x) = x
@martin_gorner
O’REILLY TensorfFlow World
Batch normalisation
depends from:
weights, biases, images
depends from:
same weights and biases, images
only one set of weights and biases in a mini-batch
=> BN is differentiable relatively to weights, biases, α and β
It can be used as a layer in the network, gradient calculations will still work
Batch-norm α, β
x =�weighted
sum + bias
activation�fn
@martin_gorner
O’REILLY TensorfFlow World
With batch normalisation (sigmoid)
sigmoid
distribution of neuron output
Batch norm
@martin_gorner
O’REILLY TensorfFlow World
With batch normalisation (RELU)
RELU
My distribution of inputs
@martin_gorner
O’REILLY TensorfFlow World
Batch normalisation done right
Batch-norm α, β
x =�weighted
sum
+ b
activation�fn
biases : �no longer useful
when activation fn is RELU
α is not useful
It does not modify output distrib.
Per neuron: | relu | sigmoid |
without BN | bias | bias |
With�BN | β | α, β |
+You can go faster: use higher learning rate
+BN also regularises: lower or remove dropout
@martin_gorner
O’REILLY TensorfFlow World
Convolutional batch normalisation
W1[4, 4, 3]
W2[4, 4, 3]
Each neuron or patch has a value:
=> compute avg and stdev across all batchsize x width x height values
b1 α1 β1
b2 α2 β2
Still, one bias, scale or offset per neuron
@martin_gorner
O’REILLY TensorfFlow World
Batch normalisation at test time
Stats on what ?
@martin_gorner
O’REILLY TensorfFlow World
Batch normalisation with Tensorflow
def batchnorm_layer(Ylogits, is_test, Offset, Scale, iteration, convolutional=False):
exp_moving_avg = tf.train.ExponentialMovingAverage(0.9999, iteration)
if convolutional: # avg across batch, width, height
mean, variance = tf.nn.moments(Ylogits, [0, 1, 2])
else:
mean, variance = tf.nn.moments(Ylogits, [0])
update_moving_averages = exp_moving_avg.apply([mean, variance])
m = tf.cond(is_test, lambda: exp_moving_avg.average(mean), lambda: mean)
v = tf.cond(is_test, lambda: exp_moving_avg.average(variance), lambda: variance)
Ybn = tf.nn.batch_normalization(Ylogits, m, v, Offset, Scale, variance_epsilon=1e-5)
return Ybn, update_moving_averages
Define one offset and/or scale per neuron
apply activation fn on Ybn
don’t forget to execute this (sess.run)
The code is on GitHub: goo.gl/DEOe7Z
@martin_gorner
O’REILLY TensorfFlow World
Demo
@martin_gorner
O’REILLY TensorfFlow World
99.5%
@martin_gorner
O’REILLY TensorfFlow World
More superpowers
high level API
@martin_gorner
O’REILLY TensorfFlow World
Layers
from tensorflow.contrib import layers
# this
Y = layers.relu(X, 200)
# instead of this
W = tf.Variable(tf.zeros([784, 200]))
b = tf.Variable(tf.zeros([200]))
Y = tf.nn.relu(tf.matmul(X,W) + b)
Sample: goo.gl/y1SSFy
@martin_gorner
O’REILLY TensorfFlow World
Model function
from tensorflow.contrib import learn, layers, metrics
def model_fn(X, Y_, mode):
Yn = … # model layers
prob = tf.nn.softmax(Yn)
digi = tf.argmax(prob, 1)
predictions = {"probabilities": prob, "digits": digi} #free-form
evaluations = {'accuracy': metrics.accuracy(digi, Y_)} #free-form
loss = tf.nn.softmax_cross_entropy_with_logits(…)
train = layers.optimize_loss(loss,framework.get_global_step(), 0.003,"Adam")
return learn.ModelFnOps(mode, predictions,loss,train,evaluations)
“features” and “targets“
learning rate
TRAIN, EVAL or INFER
Sample: goo.gl/y1SSFy
@martin_gorner
O’REILLY TensorfFlow World
Estimator
estimator = learn.Estimator(model_fn=model_fn)
estimator.fit(input_fn=… , steps=10000)
estimator.evaluate(input_fn=…, steps=1)
# => {'accuracy': … }
estimator.predict(input_fn=…)
# => {"probabilities":…, "digits":…}
# input_fn: feeds in batches of features and targets
Sample: goo.gl/y1SSFy
@martin_gorner
O’REILLY TensorfFlow World
Convolutional network
def conv_model(X, Y_, mode):
XX = tf.reshape(X, [-1, 28, 28, 1])
Y1 = layers.conv2d(XX, num_outputs=6, kernel_size=[6, 6])
Y2 = layers.conv2d(Y1, num_outputs=12, kernel_size=[5, 5], stride=2)
Y3 = layers.conv2d(Y2, num_outputs=24, kernel_size=[4, 4], stride=2)
Y4 = layers.flatten(Y3)
Y5 = layers.relu(Y4, 200)
Ylogits = layers.linear(Y5, 10)
prob = tf.nn.softmax(Ylogits)
digi = tf.cast(tf.argmax(prob, 1), tf.uint8)
predictions = {"probabilities": prob, "digits": digi} #free-form
evaluations = {'accuracy': metrics.accuracy(digi, Y_)} #free-form
loss = tf.nn.softmax_cross_entropy_with_logits(Ylogits, tf.one_hot(Y_, 10))
train = layers.optimize_loss(loss, framework.get_global_step(), 0.003, "Adam")
return learn.ModelFnOps(mode, predictions, loss, train, evaluations)
estimator = learn.Estimator(model_fn=conv_model)
model
Sample: goo.gl/y1SSFy
@martin_gorner
O’REILLY TensorfFlow World
Recurrent Neural Networks
@martin_gorner
O’REILLY TensorfFlow World
#Tensorflow @martin_gorner
deep�Science !
deep� Code ...
>TensorFlow, Keras and \
recurrent neural networks
without a PhD_
>TensorFlow, Keras and \
recurrent neural networks
without a PhD_
bit.ly/keras-rnn-codelab
bit.ly/keras-rnn-codelab
@martin_gorner
O’REILLY TensorfFlow World
200
20
2
Neural network 101 (reminder)
1200
20x20x3
@martin_gorner
O’REILLY TensorfFlow World
Activation functions (reminder)
softmax
(classification)
1
1
-1
relu
weights
bias
activation
inputs
weighted sum+b
norm
sigmoid
tanh
On last layer:
nothing
(regression)
@martin_gorner
O’REILLY TensorfFlow World
RNN
softmax
tanh
X: inputs
Y: outputs
H: internal� state
RNN cell
H
Xt
Yt
N: internal size
@martin_gorner
O’REILLY TensorfFlow World
RNN
X = Xt | Ht-1�
Ht = tanh(X.WH + bH)
Yt = softmax(Ht.W + b)
concatenation
RNN cell
H
Xt
Yt
@martin_gorner
O’REILLY TensorfFlow World
RNN training
H-1
cell
H0
Y0
X0
cell
H1
Y1
X1
cell
H2
Y2
X2
cell
H3
Y3
X3
cell
H4
Y4
X4
cell
H5
Y5
X5
The same weights and biases shared across iterations
@martin_gorner
O’REILLY TensorfFlow World
Deep RNN
0
0
cell
H’0
Y0
cell
H0
X0
cell
H’1
Y1
cell
H1
X1
cell
H’2
Y2
cell
H2
X2
cell
H’3
Y3
cell
H3
X3
cell
H’4
Y4
cell
H4
X4
cell
H’5
Y5
cell
H5
X5
L: number of layers
@martin_gorner
O’REILLY TensorfFlow World
Michel C. was born in Paris, France. He is married and has three children. He received a M.S. in neurosciences from the University Pierre & Marie Curie and the Ecole Normale Supérieure in 1987, and and then spent most of his career in Switzerland, at the Ecole Polytechnique de Lausanne. He specialized in child and adolescent psychiatry and his first field of research was severe mood disorders in adolescent, topic of his PhD in neurosciences (2002). His mother tongue is ? ? ? ? ?
Long term dependencies: a problem
Short context
English,
German,
Russian,
French …
Long context Problems…
Hn
…
Michel
C.
was
born
in
French
…
Hn-1
@martin_gorner
O’REILLY TensorfFlow World
RNN cell types
tanh
tanh
σ
Yt
Ct-1
×
+
×
×
σ
σ
Ct
tanh
Yt
tanh
σ
Xt
Ht-1
Yt
σ
×
×
×
+
1-
Simple RNN cell
GRU cell
“Generalized Recurrent Unit”
LSTM cell
“Long Short Term Memory”
Ht
Ht
Ht
Ht
Xt
Ht-1
Ht
Xt
Ht-1
Ht
@martin_gorner
O’REILLY TensorfFlow World
LSTM
LSTM = Long Short Term Memory
tanh
tanh
σ
Xt
Ht-1
Ht
Yt
Ct
Ct-1
concatenation
Element-wise operations
tanh
tanh
Neural net. layers
X = Xt | Ht-1
f = σ(X.Wf + bf)
u = σ(X.Wu + bu)
r = σ(X.Wr + br)
X’ = tanh(X.Wc + bc)
Ct = f * Ct-1 + u * X’
Ht = r * tanh(Ct)
Yt = softmax(Ht.W + b)
×
+
×
×
×
σ
σ
σ
@martin_gorner
O’REILLY TensorfFlow World
LSTM
X = Xt | Ht-1
f = σ(X.Wf + bf)
u = σ(X.Wu + bu)
r = σ(X.Wr + br)
X’ = tanh(X.Wc + bc)
Ct = f * Ct-1 + u * X’
Ht = r * tanh(Ct)
Yt = softmax(Ht.W + b)
tanh
tanh
σ
Xt
Ht-1
Ht
Yt
Ct-1
×
+
×
×
σ
σ
Ct
concatenate :
forget gate :
update gate :
result gate :
input :
new C :
new H :
output :
p+n
n
n
n
n
n
n
vector sizes
m
@martin_gorner
O’REILLY TensorfFlow World
Gru !
@martin_gorner
O’REILLY TensorfFlow World
GRU
X = Xt | Ht-1
z = σ(X.Wz + bz)
r = σ(X.Wr + br)
X’ = Xt | r * Ht-1
X” = tanh(X’.Wc + bc)
Ht = (1-z) * Ht-1 + z * X”
Yt = softmax(Ht.W + b)
p+n
n
n
p+n
n
n
vector sizes
m
GRU = Gated�Recurrent Unit
GRU
Ht
Yt
Xt
Ht-1
2 gates instead of 3 => cheaper
Ht
@martin_gorner
O’REILLY TensorfFlow World
Language model in Tensorflow
0
H5
S
t
_
J
o
h
t
_
J
o
h
n
character-based
Characters, one-hot encoded
@martin_gorner
O’REILLY TensorfFlow World
Language model in Tensorflow
0
GRU
H0
X0
H0
cells = [tf.nn.rnn_cell.GRUCell(CELLSIZE) for i in range(NLAYERS)]
mcell = tf.nn.rnn_cell.MultiRNNCell(cells, state_is_tuple=False)
Hr, H = tf.nn.dynamic_rnn(mcell, X, initial_state=Hin)
GRU
0
H’0
H’0
GRU
0
H”0
H”0
GRU
H1
X1
H0
GRU
H’1
H’0
GRU
H”1
H”1
GRU
H2
X2
H0
GRU
H’2
H’0
GRU
H”2
H”2
GRU
H3
X3
H0
GRU
H’3
H’0
GRU
H”3
H”3
GRU
H5
X4
H0
GRU
H’5
H’0
GRU
H”5
H”5
GRU
H6
X6
H0
GRU
H’6
H’0
GRU
H”6
H”6
GRU
H7
X7
H0
GRU
H’7
H’0
GRU
H”7
H”7
GRU
H8
X8
H0
GRU
H’8
H’0
GRU
H”8
H”8
H
Hin
ALPHASIZE = 98
CELLSIZE = 512
NLAYERS = 3
SEQLEN = 30
defines weights and biases internally
@martin_gorner
O’REILLY TensorfFlow World
Softmax readout layer
# Hr
Hf = tf.reshape(Hr, [-1, CELLSIZE])
0
H0
X0
H0
0
H’0
H’0
0
H”0
H”0
H1
X1
H0
H’1
H’0
H”1
H”1
H2
X2
H0
H’2
H’0
H”2
H”2
H3
X3
H0
H’3
H’0
H”3
H”3
H5
X4
H0
H’5
H’0
H”5
H”5
H6
X6
H0
H’6
H’0
H”6
H”6
H7
X7
H0
H’7
H’0
H”7
H”7
H8
X8
H0
H’8
H’0
H”8
H”8
Tip: handle sequence and batch elements the same
loss = tf.nn.softmax_cross_entropy_with_logits(Ylogits, Y_)
[ BATCHSIZE, SEQLEN, CELLSIZE ]
[ BATCHSIZE x SEQLEN, CELLSIZE ]
ALPHASIZE = 98
CELLSIZE = 512
NLAYERS = 3
SEQLEN = 30
Y0
Y1
Y2
Y3
Y4
Y5
Y6
Y7
Ylogits = tf.layers.dense(Hf, ALPHASIZE)
Y = tf.nn.softmax(Ylogits)
[ BATCHSIZE x SEQLEN, ALPHASIZE ]
[ BATCHSIZE x SEQLEN, ALPHASIZE ]
@martin_gorner
O’REILLY TensorfFlow World
Inputs and outputs
0
H0
X0
H0
0
H’0
H’0
0
H”0
H1
X1
H0
H’1
H’0
H”1
H2
X2
H0
H’2
H’0
H”2
H3
X3
H0
H’3
H’0
H”3
H5
X4
H0
H’5
H’0
H”5
H6
X6
H0
H’6
H’0
H”6
H7
X7
H0
H’7
H’0
H”7
H8
X8
H0
H’8
H’0
H”8
ALPHASIZE = 98
CELLSIZE = 512
NLAYERS = 3
SEQLEN = 30
Y0
Y1
Y2
Y3
Y4
Y5
Y6
Y7
S
t
_
A
n
d
t
_
A
n
d
r
e
r
e
w
[ BATCHSIZE, SEQLEN ]
[ BATCHSIZE, SEQLEN, ALPHASIZE ]
H: [ BATCHSIZE,� CELLSIZE x NLAYERS ]
@martin_gorner
O’REILLY TensorfFlow World
Placeholders, and the rest...
ALPHASIZE = 98
CELLSIZE = 512
NLAYERS = 3
SEQLEN = 30
Xd = tf.placeholder(tf.uint8, [None, None])
X = tf.one_hot(X, ALPHASIZE, 1.0, 0.0)
Yd_ = tf.placeholder(tf.uint8, [None, None])
Y_ = tf.one_hot(Y_, ALPHASIZE, 1.0, 0.0)
Hin = tf.placeholder(tf.float32, [None, CELLSIZE*NLAYERS])
# Y, loss, Hout = my_model(X, Y_, Hin)
predictions = tf.argmax(Y, 1)
predictions = tf.reshape(predictions, [batchsize, -1])
train_step = tf.train.AdamOptimizer(1e-3).minimize(loss)
[ BATCHSIZE, SEQLEN ]
[ BATCHSIZE, SEQLEN, ALPHASIZE ]
[ BATCHSIZE, SEQLEN ]
[ BATCHSIZE, SEQLEN, ALPHASIZE ]
[ BATCHSIZE, CELLSIZE x NLAYERS ]
Y: [ BATCHSIZE x SEQLEN, ALPHASIZE ]
[ BATCHSIZE x SEQLEN ]
[ BATCHSIZE, SEQLEN ]
@martin_gorner
O’REILLY TensorfFlow World
Bitchin’ batchin’
Ht
Ht-1
The quic seventh Mr. Herm
Batch 1
k brown heaven o
ann Zapf
Ht+1
Batch 2
fox jump
f typogr
was the
Ht+2
Batch 3
++
later
++++
later
start
for x, y_ in utils.rnn_minibatch_sequencer(codetext, BATCHSIZE, SEQLEN, nb_epochs=10):
@martin_gorner
O’REILLY TensorfFlow World
Language model in Tensorflow
ALPHASIZE = 98
CELLSIZE = 512
NLAYERS = 3
SEQLEN = 30
Xd = tf.placeholder(tf.uint8, [None, None])
X = tf.one_hot(Xd, ALPHASIZE, 1.0, 0.0)
Yd_ = tf.placeholder(tf.uint8, [None, None])
Y_ = tf.one_hot(Yd_, ALPHASIZE, 1.0, 0.0)
Hin = tf.placeholder(tf.float32, [None,
CELLSIZE*NLAYERS])
# the model
cell = [tf.nn.rnn_cell.GRUCell(CELLSIZE)
for i in range(NLAYERS)]
mcell = tf.nn.rnn_cell.
MultiRNNCell([cell]*NLAYERS,state_is_tuple=False)
Hr,H = tf.nn.
dynamic_rnn(mcell, X, initial_state=Hin)
# softmax output layer
Hf = tf.reshape(Hr, [-1, CELLSIZE])
Ylogits = layers.linear(Hf, ALPHASIZE)
Y = tf.nn.softmax(Ylogits)
Yp = tf.argmax(Y, 1)�Yp = tf.reshape(Yp, [batchsize, -1])
# loss and training step (optimizer)
loss = tf.nn.softmax_cross_entropy_with_logits(Ylogits, Y_)
train_step = tf.train.AdamOptimizer(1e-3).minimize(loss)
# training loop
for epoch in range(20):
inH = np.zeros([BATCHSIZE, INTERNALSIZE*NLAYERS])
for x, y_ in utils.rnn_minibatch_sequencer(codetext,
BATCHSIZE, SEQLEN, nb_epochs=30):
dic = {X: x, Y_: y_, Hin:inH}
_,y,outH = sess.run([train_step,Yp,H,], feed_dict=dic)
inH = outH
The code is on GitHub: github.com/martin-gorner/�tensorflow-rnn-shakespeare
@martin_gorner
O’REILLY TensorfFlow World
ee o no nonnaoter s ee seih iae r t i r io i ro s sierota tsohoreroneo rsa esia anehereeo hensh�rho etnrhhs iti saoitns t et rsearh tshseoeh ta oirhroren e eaetetnesnareeeoaraihss nshtano eter �e oooaoaeee nonn is heh easren ieson httn nihensont t e n a ooe oerhi neaeehteriseat tiet i i ntsh�orhi e ohhsiea e aht ohr er ra eeo oeeitrot hethisesaaei o saeii straieiteoeresorh e ooeri � e ninesh sort a es h rs hattnteseato sonoanr sniaase s rshninsasi na sntennn oti r etnsnrse oh n� r e tiathhnaeeano trrr hhohooon rrt eernre e rnoh
Shakespeare
0.03
epochs
C1
@martin_gorner
O’REILLY TensorfFlow World
Shakespeare
II WERENI� Are I I wos the wheer boaer.� Tin thim mh cals sate bauut site tar oue tinl an bsisonetoal yer an fimireeren.��L[IO SI Hns oret bsllssts aaau ton hete me toer frurtor sheus aed trat�� A faler bis tote oadt tou than male, tel mou ce an cime. ais fauto ws cien whus yas. Ande fert te a�ut wond aal sinr be at saar
0.1
epochs
C3
@martin_gorner
O’REILLY TensorfFlow World
BERENS Hall hat in she the hir meres.��Perstr in ame not of heard, me thin hild of shear and� ant on of mare. I lore wes lour.��DOCHES The chaster'd on not fenst� The laldoos more.
� [Ixeln thrish]
And tho priines sith of hamdeling the san wind
Shakespeare
0.2
epochs
C5
Stage directions ?
@martin_gorner
O’REILLY TensorfFlow World
KING LEAR Alas, I am not forsworn both to bod!� And let the firm I have to'st trainoured.��KING HENRY VIII I love not my father.��PORDIA He tash you will have it.��HENRY BLUTIUS Work, thou lovest my son here, thy father's fath!��CLIOND Why, then, would say, the beasts are
Shakespeare
1
epoch
C6
Invented names !
@martin_gorner
O’REILLY TensorfFlow World
Shakespeare
30
epochs
TITUS ANDRONICUS��ACT I��SCENE III An ante-chamber. The COUNT's palace.�� [Enter CLEOMENES, with the Lord SAY]��Chamberlain Let me see your worshing in my hands.
�LUCETTA I am a sign of me, and sorrow sounds it.
B10
@martin_gorner
O’REILLY TensorfFlow World
Shakespeare
30
epochs
And sorrow far into the stars of men,� Without a second tears to seek the best and bed,� With a strange service, and the foul prince of Rome�� [Exeunt MARK ANTONY and LEPIDUS]�� Well said, my lord,--��MENENIUS I do not say so.� Well, I will not have no better ways;� But not a woman's misery, and yonder to her
B10
@martin_gorner
O’REILLY TensorfFlow World
diassts_= =tlns==eti.s=tessn_((
sie_s_nts_ens= dondtnenroe dnar taonte srst anttntoilonttiteaen
detrtstinsenoaolsesnesoairt(
arssserleeeerltrdlesssoeeslslrlslie(e
drnnaleeretteaelreesioe niennoarens dssnstssaorns sreeoeslrteasntotnnai(ar dsopelntederlalesdanserl
lts(sitae(e)
Python code
0.03
epochs
A1
@martin_gorner
O’REILLY TensorfFlow World
with self.essors_sigeater(output_dits_allss,
self._train.
for sampated to than ubtexsormations.
expeddions = np.randim(natched_collection, ranger, mang_ops, samplering)
def assestErrorume_gens(assignex) as and(sampled_veases):
eved.
Python code
0.1
epochs
A2
Python
keywords
@martin_gorner
O’REILLY TensorfFlow World
def testGiddenSelfBeShareMecress(self):
with self.test_session() as sess:
tat = tf.contrib.matrix.cast_column_variable([1, 1], [0, 1, 1], [1, 7]],
[[1, 1, 1]].file(file, line_state_will_file))
with self.test_session():
self.assertAllEqual(1, l.ex6)
self.assertEqual(output_graph_def is_output_tensors_op(
tf.pro_context_name.sqrt(sess)
def test_shape(self):
res = values=value_rns[0].eval())
def tempDimpleSeriesGredicsIothasedWouthAverageData(self):
self._testDirector(self):
self._test_inv3_size = 5
with tf.train.ConvolutioBailLors_startswith("save_dir_context.PutIsprint().eval())
return tf.contrib.learn.RUCISLCCS:
# Check the orfloating so that the nimesting object mumputable othersifier.
# dense_keys.tokens_prefix/statch_size of the input1 tensors.
@property
Python code
0.4
epochs
A3
Wrong ([]) nesting
Correct use of colons:
Hallucinated function names
@martin_gorner
O’REILLY TensorfFlow World
# Copyright 2015 The TensorFlow Authors. All Rights Reserved.
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in [0.1, 2.0, 3.0]]
def __init__(self, expected):
return np.array([[0, 0, 0], [0, 0, 0]])
self.assertAllEqual(tf.placeholder(tf.float32, shape=(3, 3)),(shape, prior.pack(),
tf.float32))
for keys in tensor_list:
return np.array([[0, 0, 0]]).astype(np.float32)
# Check that we have both scalar tensor for being invalid to a vector of 1 indicating
# the total loss of the same shape as the shape of the tensor.
sharded_weights = [[0.0, 1.0]]
# Create the string op to apply gradient terms that also batch.
# The original any operation as a code when we should alw infer to the session case.
Python code
12
epochs
B10
Correct triple ([]) nesting
Recites Apache license
Tensorflow tips!
@martin_gorner
O’REILLY TensorfFlow World
...and more
@martin_gorner
O’REILLY TensorfFlow World
Tensorflow: save, restore
saver = tf.train.Saver(keep_checkpoint_every_n_hours=0.1, max_to_keep=5)
with tf.Session() as sess:
# ... training loop ...
saver.save(sess, 'file_' , global_step=iter)
=> Save variables in , the graph in
file_200
file_200.meta
with tf.Session() as sess:
resto = tf.train.import_meta_graph('file_200.meta')
resto.restore(sess, 'file_200')
=> Restore graph and variable values
Must name variables explicitly !!!
# when saving
X = tf.placeholder(tf.uint8, name='X')
Y = tf.nn.softmax(Ylogits, name='Y')
# when using restored graph
y,h = sess.run(['Y:0', 'H:0'], feed_dict={'X:0': y} )
@martin_gorner
O’REILLY TensorfFlow World
Shakespeare generation
with tf.Session() as sess:
resto = tf.train.import_meta_graph('shake_200.meta')
resto.restore(sess, 'shake_200')
# initial values
x = np.array([[0]]) # [BATCHSIZE, SEQLEN] with BATCHSIZE=1 and SEQLEN=1
h = np.zeros([1, INTERNALSIZE * NLAYERS], dtype=np.float32)
for i in range(100000):
dic = {'X:0': x, 'Hin:0': h, 'batchsize:0':1}
y,h = sess.run(['Y:0', 'H:0'], feed_dict=dic)
c = my_txtutils.sample_from_probabilities(y, topn=5)
x = np.array([[c]]) # shape [BATCHSIZE, SEQLEN] with BATCHSIZE=1 and SEQLEN=1
print(chr(my_txtutils.convert_to_ascii(c)), end="")
X
Ht
H’0
Y
Ht-1
One char at a time
@martin_gorner
O’REILLY TensorfFlow World
Tensorboard
summary_writer = tf.train.SummaryWriter("log/train_" + time)
loss_summary = tf.scalar_summary("batch_loss", loss)
# in training loop:
smm = sess.run(summaries, feed_dict=dic)�summary_writer.add_summary(smm, iteration)
Tip: use time in logdir name
Tip: use a second SummaryWriter for validation results
@martin_gorner
O’REILLY TensorfFlow World
RNN shapes
0
H5
S
t
_
J
o
h
t
_
J
o
h
n
character-based
Characters, one-hot encoded
@martin_gorner
O’REILLY TensorfFlow World
RNN shapes
0
The
USA
and
China
have
agreed
geopolitics
Words encoded as vectors: “embeddings”
Text classification
embeddings = tf.Variable(tf.random_uniform([vocab_size, embed_size]))
X = tf.nn.embedding_lookup(embeddings, train_inputs)
Tensorflow sample: goo.gl/m41mNp
Or constant => see Word2Vec
@martin_gorner
O’REILLY TensorfFlow World
Bitchin’ batchin’
China and the USA have agreed to a new round of talks 12 �The quick brown fox jumps over the lazy dog . 10
Boys will be boys . 5
Tom , get your coat . We are going out . 11
Math rules the world . Men rule math . 9
0
Hr, H =
tf.nn.dynamic_rnn(mcell, X, initial_state=Hin, sequence_lenght=slen)
Hn
∅
∅
∅
∅
∅
∅
∅
∅
∅
∅
∅
∅
∅
geopolitics
seq�len
@martin_gorner
O’REILLY TensorfFlow World
RNN shapes
The
red
cat
ate
the
mouse
Words encoded as vectors
Text translation
0
Le
chat
rouge
a
mangé
la
souris
∅
∅
Le
chat
rouge
a
mangé
la
souris
Tensorflow sample: goo.gl/KyKLDv
tf.nn.sampled_softmax_loss(…)
slow
fast
@martin_gorner
O’REILLY TensorfFlow World
RNN shapes
Images encoded as vectors
Image captioning
(simplified)
A
man
on
a
beach
flying
a
∅
kite
A
man
on
a
beach
flying
a
0
∅
kite
Google’s neural net for image captioning: goo.gl/VgZUQZ
For ex. output�of convolutional network or auto-encoder
@martin_gorner
O’REILLY TensorfFlow World
Image captioning
Google’s neural net for image captioning: goo.gl/VgZUQZ
A person riding a motorcycle on a dirt road.
A herd of elephants walking across a dry grass field.
@martin_gorner
O’REILLY TensorfFlow World
Image captioning
Google’s neural net for image captioning: goo.gl/VgZUQZ
A refrigerator filled with lots of food and drinks.
A yellow school bus parked in a parking lot.
@martin_gorner
O’REILLY TensorfFlow World
Cloud Machine Learning Engine
@martin_gorner
O’REILLY TensorfFlow World
Data-parallel distributed training
parameter servers
model�replicas
data
W’ = W + ∆W
asynchronous
updates
I ♡ noise
@martin_gorner
O’REILLY TensorfFlow World
TF high level API
from tensorflow.contrib import learn
def model_fn(X, Y_, mode):
Yn = … # model layers
predictions = {"probabilities": …, "digits": …} #free-form
evaluations = {'accuracy': metrics.accuracy(…)} #free-form
loss = …
train = layers.optimize_loss(loss, …)
return learn.ModelFnOps(mode, predictions,loss,train,evaluations)
“features” and “targets
Samples: goo.gl/F3i3bf, goo.gl/CofxFM
@martin_gorner
O’REILLY TensorfFlow World
Estimator, Experiment, learn_runner
from tensorflow.contrib.learn.python.learn.utils import saved_model_export_utils
def experiment_fn(job_dir):
return learn.Experiment(
estimator=learn.Estimator(model_fn, model_dir=job_dir,� config=learn.RunConfig(save_checkpoints_secs=None,� save_checkpoints_steps=1000)),
train_input_fn=…, # data feed
eval_input_fn=…, # data feed
train_steps=10000,
eval_steps=1,
export_strategies=make_export_strategy(export_input_fn=
serving_input_fn))
def main(argv=None):
job_dir = # parse argument --job-dir
learn_runner.run(experiment_fn, job_dir)
if __name__ == '__main__': main()
Free stuff !!!
Tensorboard graphs
Resume on fail
Parallel data feeds
Serving model export
Distributed training
trainingInput:
scaleTier: STANDARD_1
Samples: goo.gl/F3i3bf, goo.gl/CofxFM
@martin_gorner
O’REILLY TensorfFlow World
Data queues for distributed training
# dummy implementation for data that fits in memory
def train_data_input_fn(mnist):
images = tf.constant(mnist.train.images)
labels = tf.constant(mnist.train.labels)
return tf.train.shuffle_batch([images, labels], 100,
1100, 1000, enqueue_many=True)
# dummy implementation for data that fits in memory
def eval_data_input_fn(mnist):
return tf.constant(mnist.test.images),
tf.constant(mnist.test.labels)
Inserts queue nodes
Into TF graph
For practical data
queuing use the
TF Records format
batch size
trainingInput:
scaleTier: STANDARD_1
Samples: goo.gl/F3i3bf, goo.gl/CofxFM
@martin_gorner
O’REILLY TensorfFlow World
Serving input function
# Online predictions on Cloud ML Engine
def serving_input_fn():
# Placeholder for data deserialised from JSON
inputs = {'A': tf.placeholder(tf.uint8, [None, 28, 28])}
# Transform the data as needed
features = [tf.cast(inputs['A'], tf.float32)]
return input_fn_utils.InputFnOps(features, None, inputs)
trainingInput:
scaleTier: STANDARD_1
Batch of images
For MNIST
Samples: goo.gl/F3i3bf, goo.gl/CofxFM
@martin_gorner
O’REILLY TensorfFlow World
Run it
gcloud ml-engine jobs submit training job22
--job-dir=gs://mybucket/job22
--package-path=trainer
--module-name=trainer.task
--config=config.yaml
--
--<custom model arguments here>
Deploy trained model to prod = click click click
gcloud ml-engine predict
--model <model_name>
--json-instances mydigits.json
model checkpoints
tensorboard summaries
trainingInput:
scaleTier: STANDARD_1
autoscaled serving
Samples: goo.gl/F3i3bf, goo.gl/CofxFM
@martin_gorner
O’REILLY TensorfFlow World
Demo: aucnet
Retrain Inception yourself: goo.gl/Z9eNek
@martin_gorner
O’REILLY TensorfFlow World
Have fun !
Cloud ML Engine�your TensorFlow models trained in Google’s cloud.
Pre-trained models:
That’s all�folks...
Cloud Vision API
Cloud Speech API
Google Translate API
Natural Language API
Video Intelligence API
Cloud Jobs API PRIVATE BETA
Cloud Auto ML VisionALPHA�Just bring your data
Cloud TPU BETA�ML supercomputing
@martin_gorner
O’REILLY TensorfFlow World
1 �neurons
Tensorflow and deep learning without a PhD
@martin_gorner
@martin_gorner
@martin_gorner
O’REILLY TensorfFlow World