1 of 75

#Tensorflow @martin_gorner

deep�Science !

deep� Code ...

>TensorFlow, deep learning and \

recurrent neural networks

without a PhD_

>TensorFlow, deep learning and \

recurrent neural networks

without a PhD_

@martin_gorner

O’REILLY TensorfFlow World

2 of 75

The superpower: batch normalisation

@martin_gorner

O’REILLY TensorfFlow World

3 of 75

Data “whitening”

Data: large values, different scales, skewed, correlated

@martin_gorner

O’REILLY TensorfFlow World

4 of 75

Data “whitening”

Modified data: centered around zero, rescaled...

Subtract average

Divide by std dev

@martin_gorner

O’REILLY TensorfFlow World

5 of 75

Data “whitening”

(A+B)/2

A-B

Modified data: … and decorrelated (that was almost a Principal Component Analysis)

@martin_gorner

O’REILLY TensorfFlow World

6 of 75

Data “whitening”

new�A

new�B

=

A

B

x

0.05

0.12

0.61

-1.23

+

-1.45

0.12

W ?

B ?

A network layer can do this !

Scale & rotate shift

@martin_gorner

O’REILLY TensorfFlow World

7 of 75

Fully connected network

9

...

0

1

2

softmax

200

100

60

10

30

784

OK

OK ?

OK ???

OK ???

OK ???

@martin_gorner

O’REILLY TensorfFlow World

8 of 75

Without batch normalisation

sigmoid

My distribution of inputs

boo-hoo

@martin_gorner

O’REILLY TensorfFlow World

9 of 75

Batch normalisation

Center and re-scale logits

before the activation function

(decorrelate ? no, too complex)

Compute average and variance on mini-batch

Add learnable scale and offset

for each logit so as to restore expressiveness

“logit” = weighted sum + bias

one of each per neuron

Try α=stdev(x) and β=avg(x) and you have BN(x) = x

@martin_gorner

O’REILLY TensorfFlow World

10 of 75

Batch normalisation

depends from:

weights, biases, images

depends from:

same weights and biases, images

only one set of weights and biases in a mini-batch

=> BN is differentiable relatively to weights, biases, α and β

It can be used as a layer in the network, gradient calculations will still work

Batch-norm α, β

x =�weighted

sum + bias

activation�fn

@martin_gorner

O’REILLY TensorfFlow World

11 of 75

With batch normalisation (sigmoid)

sigmoid

distribution of neuron output

Batch norm

@martin_gorner

O’REILLY TensorfFlow World

12 of 75

With batch normalisation (RELU)

RELU

My distribution of inputs

@martin_gorner

O’REILLY TensorfFlow World

13 of 75

Batch normalisation done right

Batch-norm α, β

x =�weighted

sum

+ b

activation�fn

biases : �no longer useful

when activation fn is RELU

α is not useful

It does not modify output distrib.

Per neuron:

relu

sigmoid

without BN

bias

bias

With�BN

β

α, β

+You can go faster: use higher learning rate

+BN also regularises: lower or remove dropout

@martin_gorner

O’REILLY TensorfFlow World

14 of 75

Convolutional batch normalisation

W1[4, 4, 3]

W2[4, 4, 3]

Each neuron or patch has a value:

  • per image in the batch
  • per x position
  • per y position�

=> compute avg and stdev across all batchsize x width x height values

b1 α1 β1

b2 α2 β2

Still, one bias, scale or offset per neuron

@martin_gorner

O’REILLY TensorfFlow World

15 of 75

Batch normalisation at test time

Stats on what ?

  • Last batch: no
  • all images: yes (but not practical)
  • => Exponential moving average during training

@martin_gorner

O’REILLY TensorfFlow World

16 of 75

Batch normalisation with Tensorflow

def batchnorm_layer(Ylogits, is_test, Offset, Scale, iteration, convolutional=False):

exp_moving_avg = tf.train.ExponentialMovingAverage(0.9999, iteration)

if convolutional: # avg across batch, width, height

mean, variance = tf.nn.moments(Ylogits, [0, 1, 2])

else:

mean, variance = tf.nn.moments(Ylogits, [0])

update_moving_averages = exp_moving_avg.apply([mean, variance])

m = tf.cond(is_test, lambda: exp_moving_avg.average(mean), lambda: mean)

v = tf.cond(is_test, lambda: exp_moving_avg.average(variance), lambda: variance)

Ybn = tf.nn.batch_normalization(Ylogits, m, v, Offset, Scale, variance_epsilon=1e-5)

return Ybn, update_moving_averages

Define one offset and/or scale per neuron

apply activation fn on Ybn

don’t forget to execute this (sess.run)

The code is on GitHub: goo.gl/DEOe7Z

@martin_gorner

O’REILLY TensorfFlow World

17 of 75

Demo

@martin_gorner

O’REILLY TensorfFlow World

18 of 75

99.5%

@martin_gorner

O’REILLY TensorfFlow World

19 of 75

More superpowers

high level API

@martin_gorner

O’REILLY TensorfFlow World

20 of 75

Layers

from tensorflow.contrib import layers

# this

Y = layers.relu(X, 200)

# instead of this

W = tf.Variable(tf.zeros([784, 200]))

b = tf.Variable(tf.zeros([200]))

Y = tf.nn.relu(tf.matmul(X,W) + b)

@martin_gorner

O’REILLY TensorfFlow World

21 of 75

Model function

from tensorflow.contrib import learn, layers, metrics

def model_fn(X, Y_, mode):

Yn = # model layers

prob = tf.nn.softmax(Yn)

digi = tf.argmax(prob, 1)

predictions = {"probabilities": prob, "digits": digi} #free-form

evaluations = {'accuracy': metrics.accuracy(digi, Y_)} #free-form

loss = tf.nn.softmax_cross_entropy_with_logits(…)

train = layers.optimize_loss(loss,framework.get_global_step(), 0.003,"Adam")

return learn.ModelFnOps(mode, predictions,loss,train,evaluations)

“features” and “targets“

learning rate

TRAIN, EVAL or INFER

@martin_gorner

O’REILLY TensorfFlow World

22 of 75

Estimator

estimator = learn.Estimator(model_fn=model_fn)

estimator.fit(input_fn=, steps=10000)

estimator.evaluate(input_fn=, steps=1)

# => {'accuracy': … }

estimator.predict(input_fn=)

# => {"probabilities":…, "digits":…}

# input_fn: feeds in batches of features and targets

@martin_gorner

O’REILLY TensorfFlow World

23 of 75

Convolutional network

def conv_model(X, Y_, mode):

XX = tf.reshape(X, [-1, 28, 28, 1])

Y1 = layers.conv2d(XX, num_outputs=6, kernel_size=[6, 6])

Y2 = layers.conv2d(Y1, num_outputs=12, kernel_size=[5, 5], stride=2)

Y3 = layers.conv2d(Y2, num_outputs=24, kernel_size=[4, 4], stride=2)

Y4 = layers.flatten(Y3)

Y5 = layers.relu(Y4, 200)

Ylogits = layers.linear(Y5, 10)

prob = tf.nn.softmax(Ylogits)

digi = tf.cast(tf.argmax(prob, 1), tf.uint8)

predictions = {"probabilities": prob, "digits": digi} #free-form

evaluations = {'accuracy': metrics.accuracy(digi, Y_)} #free-form

loss = tf.nn.softmax_cross_entropy_with_logits(Ylogits, tf.one_hot(Y_, 10))

train = layers.optimize_loss(loss, framework.get_global_step(), 0.003, "Adam")

return learn.ModelFnOps(mode, predictions, loss, train, evaluations)

estimator = learn.Estimator(model_fn=conv_model)

model

@martin_gorner

O’REILLY TensorfFlow World

24 of 75

Recurrent Neural Networks

@martin_gorner

O’REILLY TensorfFlow World

25 of 75

#Tensorflow @martin_gorner

deep�Science !

deep� Code ...

>TensorFlow, Keras and \

recurrent neural networks

without a PhD_

>TensorFlow, Keras and \

recurrent neural networks

without a PhD_

bit.ly/keras-rnn-codelab

bit.ly/keras-rnn-codelab

@martin_gorner

O’REILLY TensorfFlow World

26 of 75

200

20

2

Neural network 101 (reminder)

1200

20x20x3

@martin_gorner

O’REILLY TensorfFlow World

27 of 75

Activation functions (reminder)

softmax

(classification)

1

1

-1

relu

weights

bias

activation

inputs

weighted sum+b

norm

sigmoid

tanh

On last layer:

nothing

(regression)

@martin_gorner

O’REILLY TensorfFlow World

28 of 75

RNN

softmax

tanh

X: inputs

Y: outputs

H: internal� state

RNN cell

H

Xt

Yt

N: internal size

@martin_gorner

O’REILLY TensorfFlow World

29 of 75

RNN

X = Xt | Ht-1

Ht = tanh(X.WH + bH)

Yt = softmax(Ht.W + b)

concatenation

RNN cell

H

Xt

Yt

@martin_gorner

O’REILLY TensorfFlow World

30 of 75

RNN training

H-1

cell

H0

Y0

X0

cell

H1

Y1

X1

cell

H2

Y2

X2

cell

H3

Y3

X3

cell

H4

Y4

X4

cell

H5

Y5

X5

The same weights and biases shared across iterations

@martin_gorner

O’REILLY TensorfFlow World

31 of 75

Deep RNN

0

0

cell

H’0

Y0

cell

H0

X0

cell

H’1

Y1

cell

H1

X1

cell

H’2

Y2

cell

H2

X2

cell

H’3

Y3

cell

H3

X3

cell

H’4

Y4

cell

H4

X4

cell

H’5

Y5

cell

H5

X5

L: number of layers

@martin_gorner

O’REILLY TensorfFlow World

32 of 75

Michel C. was born in Paris, France. He is married and has three children. He received a M.S. in neurosciences from the University Pierre & Marie Curie and the Ecole Normale Supérieure in 1987, and and then spent most of his career in Switzerland, at the Ecole Polytechnique de Lausanne. He specialized in child and adolescent psychiatry and his first field of research was severe mood disorders in adolescent, topic of his PhD in neurosciences (2002). His mother tongue is ? ? ? ? ?

Long term dependencies: a problem

Short context

English,

German,

Russian,

French …

Long context Problems…

Hn

Michel

C.

was

born

in

French

Hn-1

@martin_gorner

O’REILLY TensorfFlow World

33 of 75

RNN cell types

tanh

tanh

σ

Yt

Ct-1

×

+

×

×

σ

σ

Ct

tanh

Yt

tanh

σ

Xt

Ht-1

Yt

σ

×

×

×

+

1-

Simple RNN cell

GRU cell

“Generalized Recurrent Unit”

LSTM cell

“Long Short Term Memory”

Ht

Ht

Ht

Ht

Xt

Ht-1

Ht

Xt

Ht-1

Ht

@martin_gorner

O’REILLY TensorfFlow World

34 of 75

LSTM

LSTM = Long Short Term Memory

tanh

tanh

σ

Xt

Ht-1

Ht

Yt

Ct

Ct-1

concatenation

Element-wise operations

tanh

tanh

Neural net. layers

X = Xt | Ht-1

f = σ(X.Wf + bf)

u = σ(X.Wu + bu)

r = σ(X.Wr + br)

X = tanh(X.Wc + bc)

Ct = f * Ct-1 + u * X

Ht = r * tanh(Ct)

Yt = softmax(Ht.W + b)

×

+

×

×

×

σ

σ

σ

@martin_gorner

O’REILLY TensorfFlow World

35 of 75

LSTM

X = Xt | Ht-1

f = σ(X.Wf + bf)

u = σ(X.Wu + bu)

r = σ(X.Wr + br)

X = tanh(X.Wc + bc)

Ct = f * Ct-1 + u * X

Ht = r * tanh(Ct)

Yt = softmax(Ht.W + b)

tanh

tanh

σ

Xt

Ht-1

Ht

Yt

Ct-1

×

+

×

×

σ

σ

Ct

concatenate :

forget gate :

update gate :

result gate :

input :

new C :

new H :

output :

p+n

n

n

n

n

n

n

vector sizes

m

@martin_gorner

O’REILLY TensorfFlow World

36 of 75

Gru !

@martin_gorner

O’REILLY TensorfFlow World

37 of 75

GRU

X = Xt | Ht-1

z = σ(X.Wz + bz)

r = σ(X.Wr + br)

X = Xt | r * Ht-1

X = tanh(X.Wc + bc)

Ht = (1-z) * Ht-1 + z * X

Yt = softmax(Ht.W + b)

p+n

n

n

p+n

n

n

vector sizes

m

GRU = Gated�Recurrent Unit

GRU

Ht

Yt

Xt

Ht-1

2 gates instead of 3 => cheaper

Ht

@martin_gorner

O’REILLY TensorfFlow World

38 of 75

Language model in Tensorflow

0

H5

S

t

_

J

o

h

t

_

J

o

h

n

character-based

Characters, one-hot encoded

@martin_gorner

O’REILLY TensorfFlow World

39 of 75

Language model in Tensorflow

0

GRU

H0

X0

H0

cells = [tf.nn.rnn_cell.GRUCell(CELLSIZE) for i in range(NLAYERS)]

mcell = tf.nn.rnn_cell.MultiRNNCell(cells, state_is_tuple=False)

Hr, H = tf.nn.dynamic_rnn(mcell, X, initial_state=Hin)

GRU

0

H’0

H’0

GRU

0

H”0

H”0

GRU

H1

X1

H0

GRU

H’1

H’0

GRU

H”1

H”1

GRU

H2

X2

H0

GRU

H’2

H’0

GRU

H”2

H”2

GRU

H3

X3

H0

GRU

H’3

H’0

GRU

H”3

H”3

GRU

H5

X4

H0

GRU

H’5

H’0

GRU

H”5

H”5

GRU

H6

X6

H0

GRU

H’6

H’0

GRU

H”6

H”6

GRU

H7

X7

H0

GRU

H’7

H’0

GRU

H”7

H”7

GRU

H8

X8

H0

GRU

H’8

H’0

GRU

H”8

H”8

H

Hin

ALPHASIZE = 98

CELLSIZE = 512

NLAYERS = 3

SEQLEN = 30

defines weights and biases internally

@martin_gorner

O’REILLY TensorfFlow World

40 of 75

Softmax readout layer

# Hr

Hf = tf.reshape(Hr, [-1, CELLSIZE])

0

H0

X0

H0

0

H’0

H’0

0

H”0

H”0

H1

X1

H0

H’1

H’0

H”1

H”1

H2

X2

H0

H’2

H’0

H”2

H”2

H3

X3

H0

H’3

H’0

H”3

H”3

H5

X4

H0

H’5

H’0

H”5

H”5

H6

X6

H0

H’6

H’0

H”6

H”6

H7

X7

H0

H’7

H’0

H”7

H”7

H8

X8

H0

H’8

H’0

H”8

H”8

Tip: handle sequence and batch elements the same

loss = tf.nn.softmax_cross_entropy_with_logits(Ylogits, Y_)

[ BATCHSIZE, SEQLEN, CELLSIZE ]

[ BATCHSIZE x SEQLEN, CELLSIZE ]

ALPHASIZE = 98

CELLSIZE = 512

NLAYERS = 3

SEQLEN = 30

Y0

Y1

Y2

Y3

Y4

Y5

Y6

Y7

Ylogits = tf.layers.dense(Hf, ALPHASIZE)

Y = tf.nn.softmax(Ylogits)

[ BATCHSIZE x SEQLEN, ALPHASIZE ]

[ BATCHSIZE x SEQLEN, ALPHASIZE ]

@martin_gorner

O’REILLY TensorfFlow World

41 of 75

Inputs and outputs

0

H0

X0

H0

0

H’0

H’0

0

H”0

H1

X1

H0

H’1

H’0

H”1

H2

X2

H0

H’2

H’0

H”2

H3

X3

H0

H’3

H’0

H”3

H5

X4

H0

H’5

H’0

H”5

H6

X6

H0

H’6

H’0

H”6

H7

X7

H0

H’7

H’0

H”7

H8

X8

H0

H’8

H’0

H”8

ALPHASIZE = 98

CELLSIZE = 512

NLAYERS = 3

SEQLEN = 30

Y0

Y1

Y2

Y3

Y4

Y5

Y6

Y7

S

t

_

A

n

d

t

_

A

n

d

r

e

r

e

w

[ BATCHSIZE, SEQLEN ]

[ BATCHSIZE, SEQLEN, ALPHASIZE ]

H: [ BATCHSIZE,� CELLSIZE x NLAYERS ]

@martin_gorner

O’REILLY TensorfFlow World

42 of 75

Placeholders, and the rest...

ALPHASIZE = 98

CELLSIZE = 512

NLAYERS = 3

SEQLEN = 30

Xd = tf.placeholder(tf.uint8, [None, None])

X = tf.one_hot(X, ALPHASIZE, 1.0, 0.0)

Yd_ = tf.placeholder(tf.uint8, [None, None])

Y_ = tf.one_hot(Y_, ALPHASIZE, 1.0, 0.0)

Hin = tf.placeholder(tf.float32, [None, CELLSIZE*NLAYERS])

# Y, loss, Hout = my_model(X, Y_, Hin)

predictions = tf.argmax(Y, 1)

predictions = tf.reshape(predictions, [batchsize, -1])

train_step = tf.train.AdamOptimizer(1e-3).minimize(loss)

[ BATCHSIZE, SEQLEN ]

[ BATCHSIZE, SEQLEN, ALPHASIZE ]

[ BATCHSIZE, SEQLEN ]

[ BATCHSIZE, SEQLEN, ALPHASIZE ]

[ BATCHSIZE, CELLSIZE x NLAYERS ]

Y: [ BATCHSIZE x SEQLEN, ALPHASIZE ]

[ BATCHSIZE x SEQLEN ]

[ BATCHSIZE, SEQLEN ]

@martin_gorner

O’REILLY TensorfFlow World

43 of 75

Bitchin’ batchin’

Ht

Ht-1

The quic seventh Mr. Herm

Batch 1

k brown heaven o

ann Zapf

Ht+1

Batch 2

fox jump

f typogr

was the

Ht+2

Batch 3

++

later

++++

later

start

for x, y_ in utils.rnn_minibatch_sequencer(codetext, BATCHSIZE, SEQLEN, nb_epochs=10):

@martin_gorner

O’REILLY TensorfFlow World

44 of 75

Language model in Tensorflow

ALPHASIZE = 98

CELLSIZE = 512

NLAYERS = 3

SEQLEN = 30

Xd = tf.placeholder(tf.uint8, [None, None])

X = tf.one_hot(Xd, ALPHASIZE, 1.0, 0.0)

Yd_ = tf.placeholder(tf.uint8, [None, None])

Y_ = tf.one_hot(Yd_, ALPHASIZE, 1.0, 0.0)

Hin = tf.placeholder(tf.float32, [None,

CELLSIZE*NLAYERS])

# the model

cell = [tf.nn.rnn_cell.GRUCell(CELLSIZE)

for i in range(NLAYERS)]

mcell = tf.nn.rnn_cell.

MultiRNNCell([cell]*NLAYERS,state_is_tuple=False)

Hr,H = tf.nn.

dynamic_rnn(mcell, X, initial_state=Hin)

# softmax output layer

Hf = tf.reshape(Hr, [-1, CELLSIZE])

Ylogits = layers.linear(Hf, ALPHASIZE)

Y = tf.nn.softmax(Ylogits)

Yp = tf.argmax(Y, 1)�Yp = tf.reshape(Yp, [batchsize, -1])

# loss and training step (optimizer)

loss = tf.nn.softmax_cross_entropy_with_logits(Ylogits, Y_)

train_step = tf.train.AdamOptimizer(1e-3).minimize(loss)

# training loop

for epoch in range(20):

inH = np.zeros([BATCHSIZE, INTERNALSIZE*NLAYERS])

for x, y_ in utils.rnn_minibatch_sequencer(codetext,

BATCHSIZE, SEQLEN, nb_epochs=30):

dic = {X: x, Y_: y_, Hin:inH}

_,y,outH = sess.run([train_step,Yp,H,], feed_dict=dic)

inH = outH

@martin_gorner

O’REILLY TensorfFlow World

45 of 75

ee o no nonnaoter s ee seih iae r t i r io i ro s sierota tsohoreroneo rsa esia anehereeo hensh�rho etnrhhs iti saoitns t et rsearh tshseoeh ta oirhroren e eaetetnesnareeeoaraihss nshtano eter �e oooaoaeee nonn is heh easren ieson httn nihensont t e n a ooe oerhi neaeehteriseat tiet i i ntsh�orhi e ohhsiea e aht ohr er ra eeo oeeitrot hethisesaaei o saeii straieiteoeresorh e ooeri � e ninesh sort a es h rs hattnteseato sonoanr sniaase s rshninsasi na sntennn oti r etnsnrse oh n� r e tiathhnaeeano trrr hhohooon rrt eernre e rnoh

Shakespeare

0.03

epochs

C1

@martin_gorner

O’REILLY TensorfFlow World

46 of 75

Shakespeare

II WERENI� Are I I wos the wheer boaer.� Tin thim mh cals sate bauut site tar oue tinl an bsisonetoal yer an fimireeren.��L[IO SI Hns oret bsllssts aaau ton hete me toer frurtor sheus aed trat�� A faler bis tote oadt tou than male, tel mou ce an cime. ais fauto ws cien whus yas. Ande fert te a�ut wond aal sinr be at saar

0.1

epochs

C3

@martin_gorner

O’REILLY TensorfFlow World

47 of 75

BERENS Hall hat in she the hir meres.��Perstr in ame not of heard, me thin hild of shear and� ant on of mare. I lore wes lour.��DOCHES The chaster'd on not fenst� The laldoos more.

� [Ixeln thrish]

And tho priines sith of hamdeling the san wind

Shakespeare

0.2

epochs

C5

Stage directions ?

@martin_gorner

O’REILLY TensorfFlow World

48 of 75

KING LEAR Alas, I am not forsworn both to bod!� And let the firm I have to'st trainoured.��KING HENRY VIII I love not my father.��PORDIA He tash you will have it.��HENRY BLUTIUS Work, thou lovest my son here, thy father's fath!��CLIOND Why, then, would say, the beasts are

Shakespeare

1

epoch

C6

Invented names !

@martin_gorner

O’REILLY TensorfFlow World

49 of 75

Shakespeare

30

epochs

TITUS ANDRONICUS��ACT I��SCENE III An ante-chamber. The COUNT's palace.�� [Enter CLEOMENES, with the Lord SAY]��Chamberlain Let me see your worshing in my hands.

�LUCETTA I am a sign of me, and sorrow sounds it.

B10

@martin_gorner

O’REILLY TensorfFlow World

50 of 75

Shakespeare

30

epochs

And sorrow far into the stars of men,� Without a second tears to seek the best and bed,� With a strange service, and the foul prince of Rome�� [Exeunt MARK ANTONY and LEPIDUS]�� Well said, my lord,--��MENENIUS I do not say so.� Well, I will not have no better ways;� But not a woman's misery, and yonder to her

B10

@martin_gorner

O’REILLY TensorfFlow World

51 of 75

diassts_= =tlns==eti.s=tessn_((

sie_s_nts_ens= dondtnenroe dnar taonte srst anttntoilonttiteaen

detrtstinsenoaolsesnesoairt(

arssserleeeerltrdlesssoeeslslrlslie(e

drnnaleeretteaelreesioe niennoarens dssnstssaorns sreeoeslrteasntotnnai(ar dsopelntederlalesdanserl

lts(sitae(e)

Python code

0.03

epochs

A1

@martin_gorner

O’REILLY TensorfFlow World

52 of 75

with self.essors_sigeater(output_dits_allss,

self._train.

for sampated to than ubtexsormations.

expeddions = np.randim(natched_collection, ranger, mang_ops, samplering)

def assestErrorume_gens(assignex) as and(sampled_veases):

eved.

Python code

0.1

epochs

A2

Python

keywords

@martin_gorner

O’REILLY TensorfFlow World

53 of 75

def testGiddenSelfBeShareMecress(self):

with self.test_session() as sess:

tat = tf.contrib.matrix.cast_column_variable([1, 1], [0, 1, 1], [1, 7]],

[[1, 1, 1]].file(file, line_state_will_file))

with self.test_session():

self.assertAllEqual(1, l.ex6)

self.assertEqual(output_graph_def is_output_tensors_op(

tf.pro_context_name.sqrt(sess)

def test_shape(self):

res = values=value_rns[0].eval())

def tempDimpleSeriesGredicsIothasedWouthAverageData(self):

self._testDirector(self):

self._test_inv3_size = 5

with tf.train.ConvolutioBailLors_startswith("save_dir_context.PutIsprint().eval())

return tf.contrib.learn.RUCISLCCS:

# Check the orfloating so that the nimesting object mumputable othersifier.

# dense_keys.tokens_prefix/statch_size of the input1 tensors.

@property

Python code

0.4

epochs

A3

Wrong ([]) nesting

Correct use of colons:

Hallucinated function names

@martin_gorner

O’REILLY TensorfFlow World

54 of 75

# Copyright 2015 The TensorFlow Authors. All Rights Reserved.

#

# Licensed under the Apache License, Version 2.0 (the "License");

# you may not use this file except in compliance with the License.

# You may obtain a copy of the License at

#

# http://www.apache.org/licenses/LICENSE-2.0

#

# Unless required by applicable law or agreed to in [0.1, 2.0, 3.0]]

def __init__(self, expected):

return np.array([[0, 0, 0], [0, 0, 0]])

self.assertAllEqual(tf.placeholder(tf.float32, shape=(3, 3)),(shape, prior.pack(),

tf.float32))

for keys in tensor_list:

return np.array([[0, 0, 0]]).astype(np.float32)

# Check that we have both scalar tensor for being invalid to a vector of 1 indicating

# the total loss of the same shape as the shape of the tensor.

sharded_weights = [[0.0, 1.0]]

# Create the string op to apply gradient terms that also batch.

# The original any operation as a code when we should alw infer to the session case.

Python code

12

epochs

B10

Correct triple ([]) nesting

Recites Apache license

Tensorflow tips!

@martin_gorner

O’REILLY TensorfFlow World

55 of 75

...and more

@martin_gorner

O’REILLY TensorfFlow World

56 of 75

Tensorflow: save, restore

saver = tf.train.Saver(keep_checkpoint_every_n_hours=0.1, max_to_keep=5)

with tf.Session() as sess:

# ... training loop ...

saver.save(sess, 'file_' , global_step=iter)

=> Save variables in , the graph in

file_200

file_200.meta

with tf.Session() as sess:

resto = tf.train.import_meta_graph('file_200.meta')

resto.restore(sess, 'file_200')

=> Restore graph and variable values

Must name variables explicitly !!!

# when saving

X = tf.placeholder(tf.uint8, name='X')

Y = tf.nn.softmax(Ylogits, name='Y')

# when using restored graph

y,h = sess.run(['Y:0', 'H:0'], feed_dict={'X:0': y} )

@martin_gorner

O’REILLY TensorfFlow World

57 of 75

Shakespeare generation

with tf.Session() as sess:

resto = tf.train.import_meta_graph('shake_200.meta')

resto.restore(sess, 'shake_200')

# initial values

x = np.array([[0]]) # [BATCHSIZE, SEQLEN] with BATCHSIZE=1 and SEQLEN=1

h = np.zeros([1, INTERNALSIZE * NLAYERS], dtype=np.float32)

for i in range(100000):

dic = {'X:0': x, 'Hin:0': h, 'batchsize:0':1}

y,h = sess.run(['Y:0', 'H:0'], feed_dict=dic)

c = my_txtutils.sample_from_probabilities(y, topn=5)

x = np.array([[c]]) # shape [BATCHSIZE, SEQLEN] with BATCHSIZE=1 and SEQLEN=1

print(chr(my_txtutils.convert_to_ascii(c)), end="")

X

Ht

H’0

Y

Ht-1

One char at a time

@martin_gorner

O’REILLY TensorfFlow World

58 of 75

Tensorboard

summary_writer = tf.train.SummaryWriter("log/train_" + time)

loss_summary = tf.scalar_summary("batch_loss", loss)

# in training loop:

smm = sess.run(summaries, feed_dict=dic)�summary_writer.add_summary(smm, iteration)

Tip: use time in logdir name

Tip: use a second SummaryWriter for validation results

@martin_gorner

O’REILLY TensorfFlow World

59 of 75

RNN shapes

0

H5

S

t

_

J

o

h

t

_

J

o

h

n

character-based

Characters, one-hot encoded

@martin_gorner

O’REILLY TensorfFlow World

60 of 75

RNN shapes

0

The

USA

and

China

have

agreed

geopolitics

Words encoded as vectors: “embeddings”

Text classification

embeddings = tf.Variable(tf.random_uniform([vocab_size, embed_size]))

X = tf.nn.embedding_lookup(embeddings, train_inputs)

Tensorflow sample: goo.gl/m41mNp

Or constant => see Word2Vec

@martin_gorner

O’REILLY TensorfFlow World

61 of 75

Bitchin’ batchin’

China and the USA have agreed to a new round of talks 12 �The quick brown fox jumps over the lazy dog . 10

Boys will be boys . 5

Tom , get your coat . We are going out . 11

Math rules the world . Men rule math . 9

0

Hr, H =

tf.nn.dynamic_rnn(mcell, X, initial_state=Hin, sequence_lenght=slen)

Hn

geopolitics

seq�len

@martin_gorner

O’REILLY TensorfFlow World

62 of 75

RNN shapes

The

red

cat

ate

the

mouse

Words encoded as vectors

Text translation

0

Le

chat

rouge

a

mangé

la

souris

Le

chat

rouge

a

mangé

la

souris

Tensorflow sample: goo.gl/KyKLDv

tf.nn.sampled_softmax_loss(…)

slow

fast

@martin_gorner

O’REILLY TensorfFlow World

63 of 75

RNN shapes

Images encoded as vectors

Image captioning

(simplified)

A

man

on

a

beach

flying

a

kite

A

man

on

a

beach

flying

a

0

kite

Google’s neural net for image captioning: goo.gl/VgZUQZ

For ex. output�of convolutional network or auto-encoder

@martin_gorner

O’REILLY TensorfFlow World

64 of 75

Image captioning

Google’s neural net for image captioning: goo.gl/VgZUQZ

A person riding a motorcycle on a dirt road.

A herd of elephants walking across a dry grass field.

@martin_gorner

O’REILLY TensorfFlow World

65 of 75

Image captioning

Google’s neural net for image captioning: goo.gl/VgZUQZ

A refrigerator filled with lots of food and drinks.

A yellow school bus parked in a parking lot.

@martin_gorner

O’REILLY TensorfFlow World

66 of 75

Cloud Machine Learning Engine

@martin_gorner

O’REILLY TensorfFlow World

67 of 75

Data-parallel distributed training

parameter servers

model�replicas

data

W’ = W + ∆W

asynchronous

updates

I noise

@martin_gorner

O’REILLY TensorfFlow World

68 of 75

TF high level API

from tensorflow.contrib import learn

def model_fn(X, Y_, mode):

Yn = … # model layers

predictions = {"probabilities": …, "digits": …} #free-form

evaluations = {'accuracy': metrics.accuracy(…)} #free-form

loss = …

train = layers.optimize_loss(loss, …)

return learn.ModelFnOps(mode, predictions,loss,train,evaluations)

“features” and “targets

@martin_gorner

O’REILLY TensorfFlow World

69 of 75

Estimator, Experiment, learn_runner

from tensorflow.contrib.learn.python.learn.utils import saved_model_export_utils

def experiment_fn(job_dir):

return learn.Experiment(

estimator=learn.Estimator(model_fn, model_dir=job_dir,� config=learn.RunConfig(save_checkpoints_secs=None,� save_checkpoints_steps=1000)),

train_input_fn=, # data feed

eval_input_fn=, # data feed

train_steps=10000,

eval_steps=1,

export_strategies=make_export_strategy(export_input_fn=

serving_input_fn))

def main(argv=None):

job_dir = # parse argument --job-dir

learn_runner.run(experiment_fn, job_dir)

if __name__ == '__main__': main()

Free stuff !!!

Tensorboard graphs

Resume on fail

Parallel data feeds

Serving model export

Distributed training

trainingInput:

scaleTier: STANDARD_1

@martin_gorner

O’REILLY TensorfFlow World

70 of 75

Data queues for distributed training

# dummy implementation for data that fits in memory

def train_data_input_fn(mnist):

images = tf.constant(mnist.train.images)

labels = tf.constant(mnist.train.labels)

return tf.train.shuffle_batch([images, labels], 100,

1100, 1000, enqueue_many=True)

# dummy implementation for data that fits in memory

def eval_data_input_fn(mnist):

return tf.constant(mnist.test.images),

tf.constant(mnist.test.labels)

Inserts queue nodes

Into TF graph

For practical data

queuing use the

TF Records format

batch size

trainingInput:

scaleTier: STANDARD_1

@martin_gorner

O’REILLY TensorfFlow World

71 of 75

Serving input function

# Online predictions on Cloud ML Engine

def serving_input_fn():

# Placeholder for data deserialised from JSON

inputs = {'A': tf.placeholder(tf.uint8, [None, 28, 28])}

# Transform the data as needed

features = [tf.cast(inputs['A'], tf.float32)]

return input_fn_utils.InputFnOps(features, None, inputs)

trainingInput:

scaleTier: STANDARD_1

Batch of images

For MNIST

@martin_gorner

O’REILLY TensorfFlow World

72 of 75

Run it

gcloud ml-engine jobs submit training job22

--job-dir=gs://mybucket/job22

--package-path=trainer

--module-name=trainer.task

--config=config.yaml

--

--<custom model arguments here>

Deploy trained model to prod = click click click

gcloud ml-engine predict

--model <model_name>

--json-instances mydigits.json

model checkpoints

tensorboard summaries

trainingInput:

scaleTier: STANDARD_1

autoscaled serving

@martin_gorner

O’REILLY TensorfFlow World

73 of 75

Demo: aucnet

Retrain Inception yourself: goo.gl/Z9eNek

@martin_gorner

O’REILLY TensorfFlow World

74 of 75

Have fun !

Cloud ML Engine�your TensorFlow models trained in Google’s cloud.

Pre-trained models:

That’s all�folks...

Martin Görner

Google Developer relations

@martin_gorner

Cloud Vision API

Cloud Speech API

Google Translate API

Natural Language API

Video Intelligence API

Cloud Jobs API PRIVATE BETA

Cloud Auto ML VisionALPHA�Just bring your data

Cloud TPU BETA�ML supercomputing

@martin_gorner

O’REILLY TensorfFlow World

75 of 75

1 �neurons

Tensorflow and deep learning without a PhD

@martin_gorner

@martin_gorner

@martin_gorner

O’REILLY TensorfFlow World