DL in NLP 2020. Spring. Quiz 3
Neural Networks. Part 2
Some questions can be not mentioned in the lecture explicitly, but you can still use logic and google.
Credit for some questions:
What is the derivative of a sigmoid function?
sigmoid(x) * (1 - sigmoid(x))
x^2 * sigmoid(x)
sigmoid(x)^2 - sigmoid(x)
How do we compute gradients in backpropagation algorithm?
They are estimated with finite differences
They are computed symbolically and represented in a closed form
They are computed with the rule of derivative of the composition of functions
Default choice of nonlinearity
It is OX-symmetrical
In is OY-symmetrical
It produces more complex functions with less layers
It doest not saturate in +region
It converges faster (in practice)
What's the main drawback of the ReLU activation function?
it's not symmetric around 0
it's not smooth and cannot be differentiated
it can zero out all the gradients from some point in the training process
y = max(0, x @ W + b), dout - downstream gradient, @ - matrix multiplication, all other operation are element-wise. What is d(loss)/dW?
x.T @ dout * (y > 0)) + dy / db
x.T @ dout * (y > 0))
W @ max(0, y) * dout + dy / db
W @ max(0, y) * dout
x @ W
Gradients with respect to x, y, z, w. Green numbers are forward pass. Red number number is a downstream gradient. Format your answer according to the pattern: x.xx, y.yy, z.zz, w.ww. Example answer: -3.00, 9.60, 18.66, -1.00
What is a good way of weights initialization?
Small random numbers
All = constant > 0
Where is the place of the BatchNorm layer in the FFNN architecture?
Before the activation function
After the activation function
Before the input layer
The BatchNorm layer normalizes data over which axis?
Over instance axis
Over feature axis
What is a better way of searching neural net's hyperparameters?
Good combinations of hyperparameters are not probable
Random search produce more diverse sets of hyperparameters
Gradient search allows the algorithm to converge faster
Can we use different learning rates at different layers of a neural network?
When neural network training should be stopped?
When train loss becomes constant
When train loss is zero
When validation loss starts to increase
What should be done if train loss is much less than validation loss?
Probably nothing at all - the model learned everything it can, maybe is just a noise in the dataset
Collect more data
Check that your train one-hot encoding is consistent with your validation one-hot encoding
Check for a data leak
Make a hyperparameter search
Reduce model capacity
Check that all labels are in the training data
Try to change learning rate or to schedule it differently
Check your data preprocessing algorithm
Your questions about the lecture (if any)
A copy of your responses will be emailed to the address you provided.
Never submit passwords through Google Forms.
This content is neither created nor endorsed by Google.
Terms of Service