DL in NLP 2020. Spring. Quiz 6
Some questions can be not mentioned in the lecture explicitly, but you can still use logic and google.
Credit for some questions:
cs231n.stanford.edu
* Required
Email address
*
Your email
Github account
*
Your answer
Why vanishing gradient is a problem?
Gradient signal from faraway is lost because is much smaller than gradient signal from closed by
actually it is not a problem because linguistically we do not have long distance dependencies
it messes up a gradient update from part and future
Give any example of a sentence with longdistance dependency, where shortterm dependency leads to a wrong answer. You may use English or Russian.
Your answer
How exploding gradients problem can be detected?
Loss does not change at all
Loss changes chaotically
The network systematically ignores shortterm dependencies
Gradients change their direction, but have constant norm
What is the optimal choice for h in the grading clipping formula?
http://proceedings.mlr.press/v28/pascanu13.pdf
0.5
[0.5..10]*average norm over a sufficiently large number of updates
the maximal visible norm so far
Clear selection
Suppose the reset and update gate values are numbers, not vectors. Pick the situations when the previous hidden state of GRU cannot affect the current one for any possible value of it
Reset gate is close to infinity
Reset gate is close to 1
Update gate is close to 0
Update gate is close to 1, reset gate is close to 0
Update gate is close to 1
Update gate is close to infinity
It is impossible
Consider the LSTM and GRU formulas given in lecture. What is the possible range for the values of the hidden state components?
Any real number
From 0 to 1
From 1 to 1
It depends from the timestep
It depends from the hidden state dimension
Clear selection
When LSTM is a good default choice?
lots of training data
particularily long dependencies
no prepared embeddings
short sequences
lack of computational resources
Select potential solutions to the vanishing/exploding grading problem is other networks.
residual connection
skip connection
highway connection
dense connection
internet connection
proxy connection
A direct connection without additional weights between the first and the last layers of the network is called
proxy connection
highway connection
residual connection
skip connection
dense connection
Your questions about the lecture (if any)
Your answer
Send me a copy of my responses.
Submit
Never submit passwords through Google Forms.
reCAPTCHA
Privacy
Terms
This content is neither created nor endorsed by Google.
Report Abuse

Terms of Service

Privacy Policy
Forms