CS 451 Quiz 16
Large scale machine learning
In mini-batch gradient descent, a typical choice of the mini-batch-size b is
b = 10
b = sqrt(m)
b = m/10
What is a typical training set size for a modern "large dataset"?
m = 200,000
m = 5,000,000
m = 100,000,000
Non-linearly-separable data can be handled with a linear classifier if it is first mapped to a higher-dimensional feature space
In order to check stochastic gradient descent for convergence, we can compute the average of the, say, last 1000 cost values. For each training example, the cost value should be computed
before making the gradient descent step
after making the gradient descent step
How can you tell that training with a large data set will give better performance than when training with just a small subset (m = 1000) of the data?
If the learning curves (training and validation costs as a function of m) indicate high bias for small m
If the learning curves (training and validation costs as a function of m) indicate high variance for small m
Batch gradient descent means to make a single gradient descent steps after looking at
one training example
several/many training examples
all training examples
In order for stochastic gradient descent to converge, it can be a good idea to decrease the learning rate with the number of iterations.
For large training sets, stochastic gradient descent can be much faster than batch gradient descent
K nearest neighbors is an algorithm for
The "Kernel trick" refers to
the fact that Gaussian kernels only need to be applied to the support vectors
an efficient approximation of the L2 norm between two features
a web site with lifestyle trends and advice for machine learning practitioners
the fact that we don't have to compute the high-dimensional mapping for each feature, but only a function of the dot product between pairs of features
This content is neither created nor endorsed by Google.
Terms of Service