CS 451 Quiz 21
PCA, Gaussian distributions, and density estimation
* Required
Email address
*
Your email
PCA
Which 3 of the following 6 quantities are identical?
*
The number of principal components
The original dimensionality
The reduced dimensionality
k
m
n
Required
How do we measure the total variation in the data?
*
Average magnitude of all data points
Average squared magnitude of all data points
Maximum distance between two data points
Average squared projection error
How can we choose a good value for k?
*
Use a fixed fraction of n, for instance k = 0.1 * n
Use the "elbow method"
Use the smallest k such that a fixed fraction of the variance is retained
How can we efficiently compute the ratio of the average squared reprojection error and the total variation in the data, for different values of k?
*
Compute the determinant of Sigma
Compute [U, S, V] = svd(Sigma), then compute the determinant of U
Compute [U, S, V] = svd(Sigma), then compute the determinant of S
Compute [U, S, V] = svd(Sigma), then sum up the diagonal elements of S
If we want to use PCA for speeding up supervised learning, of which dataset should we compute the principal components?
*
the positive training examples
all (unlabeled) training examples
all (unlabeled) data (both training and test examples)
Why is PCA not a good way to address overfitting?
*
Because it is slower than regularization
Because it is difficult to pick the right k
Because it does not make use of the labels
Anomaly detection and Gaussian distributions
In anomaly detection – like in classification – we try to learn a decision boundary, but we are only given one type of training examples (e.g., only negative examples)
*
True
False
The maximum value of a Gaussian distribution (for any mu and any sigma) is always 1
*
True
False
Given a dataset {x1, x2, ..., xn}, where each xi is a real number, what does (Gaussian) density estimation do?
*
Estimate the parameters mu and sigma of the Gaussian distribution that might have generated the data
Given mu and sigma, estimate the probability that the data was generated by the Gaussian distribution with these parameters
For which problems would anomaly detection be a suitable algorithm?
*
Given unlabeled credit card transactions, group them into similar categories (e.g., groceries, paying utility bills, vacation charges)
Given medical records, identify patients with unusual health conditions
Predicting a stock price given the price histories of similar stocks
Predicting whether an electronic component might fail given a series of test results
Required
Submit
This content is neither created nor endorsed by Google.
Report Abuse

Terms of Service

Additional Terms
Forms