CSCI 5521: Introduction to Machine Learning Homework 2 solution


Original Work ?


5/5 - (4 votes)

1. (30 points) Consider the following 2 sets of points in the plane:
A :




B :




3 2 1 0 1 2 3
(a) What is the first principal component w1 (use the unbiased estimation of covariance)? Draw the first principal component direction w1 on the plot, anchored at
the origin.
(b) Consider a more general case (not specific to the aforementioned samples): PCA
performs linear dimensionality reduction with z
t = WT x
, where x
t ∈ R
D is
the original data for the t-th sample, z
t ∈ R
is the low-dimensional projection
(d < D), W ∈ R D×d is the PCA projection matrix (each column is a principal component). Professor HighLowHigh claims that we can reconstruct the original data with v t = Wzt , so that ∀t v t = x t . Is the claim correct? Explain your answer with necessary details (you can use formulations if it helps explain). (c) Compute the Within-Class Scatter matrix SW and Between-Class Scatter matrix SB. (d) What is the Fisher projection direction w found by the Fisher Linear Discriminant Analysis(LDA)? Normalize w to have unit length, that is ||w||2 = 1, and draw such w on the plot, anchored at the origin. 1 Instructor: Catherine Qi Zhao. TA: Prithvi Raj Botcha, Shi Chen, Suzie Hoops, James Yang, Yifeng Zhang. Email: 1 Note: You need to show all your calculations in order to get full credits. 2. (30 points) Given the following data points in 1D: x1 = 1, x2 = 4, x3 = 5, x4 = 6, x5 = 7, x6 = 8, x7 = 10, x8 = 12, x9 = 14, perform k-means clustering algorithm for k = 3. (a) Start from initial cluster centers c1 = 0, c2 = 5, c3 = 10. Show your steps for all iterations: (1) the cluster assignments y1, · · · , y9; (2) the updated cluster centers at the end of that iteration. (b) How many iterations does it take for k-means algorithm to converge (i.e., number of iterations includes all iterations you perform to find convergence)? What is the reconstruction error (i.e., distortion measure J, equation 9.1 of the Bishop’s textbook) at the end of that iteration? (c) Repeat the above steps with initial cluster centers c1 = 2, c2 = 7, c3 = 12. (d) How many iterations does it take for k-means algorithm to converge in this case? What is the reconstruction error at the end of that iteration? (e) Comparing (a) with (c), which solution is better? Why? 3. (40 points) In this programming exercise you will implement k-means and Principal Component Analysis algorithms: (a) You will first apply k-means algorithm (K = 8) to the provided dataset Digits089.csv. The dataset contains 3000 samples, where each sample has 784 features. The first column of the data file contains flags that are used for separating training and test samples, the second column includes class labels (i.e., 0, 8, 9), and the rest of the columns store features. Your program should initialize the centers with a set of pre-selected samples (see the template for detailed initialization), iteratively update the center of each cluster based on the training samples, and record the reconstruction error after each iteration. After reaching convergence, it should assign a class label to each cluster based on majority voting (i.e., use the most dominant label in the cluster as its class label). To estimate the quality of clustering, you will use the clusters for classification of test samples. More specifically, you should determine the cluster for each test sample, and use the class label of the selected cluster for prediction. Report the number of iterations for convergence and the classification accuracy on the test samples. Plot the history of reconstruction errors. Is the plot shape following what you expect? The code for plotting and computing classification accuracy is included in, and you do not need to modify the file. (b) Repeat the above, but use low-dimensional data obtained from PCA. Your PCA algorithm should reduce the original training samples to dimensions needed to capture > 90% of the variance. Note that you should use the means and projection
matrix learned from training samples for projecting test samples. How many
dimensions are necessary in this case? Does PCA help clustering?
Explain. (Hint: Consider both the classification accuracy and the runtime of
the algorithm.)
(c) Repeat (b), but using only the first principal component. Are the results better? Explain.
We have provided the skeleton code and for implementing the
algorithms. They are written in a scikit-learn convention, where you have a fit function
for model training and a predict function for generating predictions on given samples.
To verify your implementation, call the main function, which automatically
generates plots for the reconstruction errors and computes the classification accuracy.
• Things to submit:
1. hw2 sol.pdf: a document containing all your answers for the written questions
(including answers and plots for problem 3).
2. a Python source file containing your implementation of the kmeans algorithm with header class Kmeans. Use the skeleton file
found with the data on the class web site, and fill in the missing parts. The fit
function should take the training samples and training labels as inputs, initialize
the centers, update the model parameters and record the reconstruction errors.
It should also assign a class label to each cluster, and return the number of
iterations for convergence as well as the history of reconstruction errors. The
predict function should take the test samples as inputs, and predict their class
labels based on clustering results. The compute error function should take the
samples and cluster assignment (a vector specifying the cluster ID assigned to each
sample) as inputs, and return the reconstruction error for the current assignment.
3. a Python source file containing your implementation of the Principal Component Analysis algorithm with header class PCA. Given the original
samples and an optional parameter num dim as inputs, the fit function should
determine the projection matrix based on the training samples, and return the
low-dimensional projection as well as the dimension of features. If num dim is
specified during initialization of the class, it should project the samples to d dimensional data, where d=num dim. Otherwise, it will use the dimensions that
capture > 90% of the variance. The predict function should return the lowdimensional projection of test samples based on the projection matrix and means
learned from the training samples.
• Submit: All material must be submitted electronically via Gradescope. Note that
there are two entries for the assignment, i.e., Hw2-Written (for hw2 sol.pdf)
and Hw2-Programming (for a zipped file containing the Python code),
please submit your files accordingly. We will grade the assignment with vanilla
Python, and code submitted as iPython notebooks will not be graded.