NLP. Quiz 1
Intro to NLP and Deep Learning. Word embeddings.
Some questions can be not mentioned in the lecture explicitly, but you can still use logic and google.
What are the advantages of deep learning approach over classical machine learning approach?
It works well with almost raw data and requires much less feature engineering
Deep learning models have higher capacity
It is always perform better given the same dataset
It works better with complex feature representations: a lot of categorical and continious variables
Models are faster
Should one have domain specific knowledge in, say pharmacology, to predict possible drugs using deep learning against given disease?
yes, one should have Ph.D. in pharmacology
no, it's not necessary
What is the main difficulty of processing natural language?
Because one cannot use gradient descent methods (cost function is not differentiable)
The need to build complex formal models
Because of learning to distinct ambiguity in language requires understanding the context.
How many verbs in the sentence: "Can you can a can as a canner can can a can?"
What is the most possible solution of equation: word2vec('"king") + word2vec("woman") - word2vec("man") = x?
vector that is close, but not equal to word2vec("queen")
Let the vector representation for the word "jungle" be [-0.123 0.432 1.453 -0.003]. Which of these vectors are probable to be representations of the word "forest"?
[0 0 0 0 0 0 0 1]
[-0.120 0.410 1.312 -0.012]
[0 0 0 1 0 1]
[-0.140 0.5 1.479 0.002]
[-1.453 0.002 0.132 -0.231]
What are the advantages of using small dense vector representations (eg. word2vec) compared to large sparse vectors (eg. TF-IDF)
Better gradient flow
Better semantic and syntactic properties
Linear models perform better with dense representations in practice
More information in the vector
Faster to train
Check all true statements about Negative Sampling
It speeds up computations by simplifying normalization coefficient for softmax in CBOW model
It greatly reduces number of iterations required to reach convergence in Skip-Gram model
It works better to sample words with unigram distribution (word frequency) in power of 3/4 when in power of 1
It works best to sample words with uniform distribution
Which of the following tasks can be used for *intrinsic* word vector evaluation?
Part of speech tagging
Named entity recognition
Correlation with human evaluation of word similarity
Your questions about the lecture (if any, you may write in Russian as well)
Any suggestions how to make this course better
Never submit passwords through Google Forms.
This content is neither created nor endorsed by Google.
Terms of Service