RAAI Summer School 2019
"Neural Networks with Attention Mechanism for Efficient Paraphrase Retrieval" quiz
Shape of embedding matrix depends on *
It is better to have a batch size equal to the number of samples in the training set *
List your favorite optimizers *
Tf-Idf representation is always better than BoW (for text classification task) *
L-2 norm of a vector is always larger than L-1 norm *
Rectangular matrix multiplication is always possible *
Why do you need to do optimizer.zero_grad() in PyTorch? *
What's the shape of the output? *
