1. Load the Iris dataset, drop the target column species and standardise the feature values using StandardScaler.
Then, run PCA on the standardised dataset and report the top (i.e. largest) eigenvalue (to the nearest 2 decimal places) from the result of PCA. Hint:An eigenvalue represents the captured/explained variance in the direction of the respective eigenvector.
*
2 points
Your answer
2. Using the PCA fitted in Question 1, find out and report the explainedvariance ratio (to the nearest 2 decimal places) by the first component. *
2 points
Your answer
3.
After observing the explained variance ratio from Question 2, find out and report how many principal components we should keep to preserve at least 90% of variance of the data? *
2 points
Your answer
4. Perform K-means clustering with K = 2 on the standardised data from Question 1. Now, using the first single data sample from the data standardised in from Question 1, find out and report the distance of that data point to each cluster centre (write the values comma separated to the nearest 2 decimal). Note : Use 2022 as the random seed value. *
3 points
Your answer
5. PCA looks to find homogeneous subgroups among the observations. *
1 point
6.
In K-means clustering, we seek to partition the observations into a pre-specified number of clusters. *
1 point
7. Consider a dataset with four observations: A, B, C, and D, and the following pairwise distances between them (shown in the image below).
Perform agglomerative hierarchical clustering using complete linkage and visualise the dendrogram. Now, if you cut the dendrogram at a height of 3, how many distinct clusters will be formed?
*
2 points
Your answer
A copy of your responses will be emailed to the address you provided.