Introduction to Artificial Neural Networks
XXI Seminar on Software for Nuclear, Subnuclear and Applied Physics
Alghero, 9-14 June 2024
1
This lecture
2
Recap of this morning lecture
3
Artificial Neural Networks
4
(Artificial) neural networks: the “Model”
5
Brief history, highs and lows
6
Complexity growth
=> increasing complexity of the network (number of neurons and connections)
7
Performance on classic problems
8
Speech recognition
Image classification
My favorite performance examples
9
OpenAI GPT3
Generative Pre-trained Transformer
A 12M$ autocomplete (that is not really understanding what is talking about, but can still write better than most of us)
10
2020
Generate images from descriptions
11
2021
June 2022: solving simple math problems
12
2022
2023: GPT-3.5 / GPT-4.0
13
Neural Nets Basic elements
14
A neural network node: the artificial neuron
15
The MLP model
16
Universal approximation theorem
“One hidden layer is enough to represent (not learn) an approximation of any function to an arbitrary degree of accuracy” (I. Goodfellow et al. 2016)
17
Example (1-D input)
Approximate this function
With a weighted sum of functions like this one
18
Example
19
Training of an MLP
20
Training a NN
21
loss
weights space
How to find a minimum?
Gradient Descent
Stochastic Gradient Descent (SGD):
22
Not as simple as you would imagine
23
Learning rate, epochs and batches
24
In reality
25
Training and overfitting
26
Neural Networks, computers and mathematics
27
Back-propagation
Calculating the gradient in complex networks could be computationally expensive:
28
Deep networks
29
Deep Feed Forward networks
The simplest extension to the MLP is to just add more hidden layers
Other names of this network architecture
30
depth
Why going deeper?
Hold on… wasn’t there a theorem saying that MLP is good enough ? Yes but…
Advantages of Deep architectures
31
Activation functions
32
Deep architectures
33
Dropout and regularization methods
34
(Batch) normalization / standard scaling
35
A typical observable, e.g. the invariant mass of a pairs of leptons
Normalized version
DNN Tools
36
Keras
PS: another popular DNN toolset is pytorch not covered here
37
Other common tools
Common alternative to keras
38
Keras Sequential example
39
Keras “Model” Functional API
A NN can be seen as the composition of multiple functions (one per layer), e.g.
x = Input()
layer1=FirstLayerType(parameters) (x)
layer2=SecondLayerType(parameters) (layer1)
layer3=ThirdLayerType(parameters) (x)
layer4=FourthLayerType(parameters)([layer2,layer3])
40
Input “x”
layer1
layer2
layer3
layer4
y=f4( f2(f1(x)),f3 (x) )
A (modernized) MLP in keras
from keras.models import Model
from keras.layers import Input, Dense
x = Input(shape=(32,))
hid = Dense(32, activation=”relu”)(x)
out = Dense(1, activation=”sigmoid”)(hid)
model = Model(inputs=x, outputs=out)
model.summary()
from keras.utils import plot_model
plot_model(model, to_file='model.png')
41
From the ~1995 to ~2010
from keras.models import Model
from keras.layers import Input, Dense
x = Input(shape=(32,))
hid = Dense(32, activation=”sigmoid”)(x)
out = Dense(1, activation=”sigmoid”)(hid)
model = Model(inputs=x, outputs=out)
from keras.models import Model
from keras.layers import Input, Dense
x = Input(shape=(32,))
b = Dense(32,activation=”relu”)(a)
c = Dense(32,activation=”relu”)(b)
d = Dense(32,activation=”relu”)(c)
e = Dense(32,,activation=”sigmoid”)(d)
model = Model(inputs=x, outputs=e)
42
Training a model with Keras
from keras.layers import Input, Dense
from keras.models import Model
# This returns a tensor
inputs = Input(shape=(784,))
# a layer instance is callable on a tensor, and returns a tensor
x = Dense(64, activation='relu')(inputs)
x = Dense(64, activation='relu')(x)
predictions = Dense(10, activation='softmax')(x)
# This creates a model that includes
# the Input layer and three Dense layers
model = Model(inputs=inputs, outputs=predictions)
model.compile(optimizer='rmsprop',
loss='categorical_crossentropy',
metrics=['accuracy'])
model.fit(data, labels) # starts training
43
Those are numpy arrays with your data
Keras layers
44
Keras basic layers
45
Callbacks
from keras.callbacks import EarlyStopping, ReduceLROnPlateau
# train
history = model.fit(X_train, y_train, epochs=n_epochs, batch_size=batch_size, verbose = 2,
validation_data=(X_test, y_test),
callbacks = [
EarlyStopping(monitor='val_loss', patience=10, verbose=1),
ReduceLROnPlateau(monitor='val_loss', factor=0.1, patience=2, verbose=1)
])
46
Assignment 1
Start from this notebook: Exercise 1
47
Assignment 2
48
Start from this notebook: Exercise 2