Description
1. (a) Why is it more efficient to process data points if they are lower-dimensional vectors? State one reason.
(b) What is a potential trouble of reducing the dimensionality of input vectors before training a classifier? State one reason.
2. (a) Given a training set D = {x1,…,xN}, show that the reconstruction error of principal component analysis (PCA) could be written down as
1 N
N ∑ n=1kxn− ˆ xnk2 2 =
d ∑
j=q+1
w i Cwi,
where wi is the i-th principal component or the eigenvector of the input covariance matrixC.
(b) Show that
Σ=WCW ⇐⇒σ2 j =w j Cwj, for all j = 1,…,d, whereWis the weight matrix of PCA,Cis the input covariance matrix, and
Σ= diag(σ2 1,…,σ2 q) =
σ2 1 0 ··· 00 σ22 ··· 0. . . 0 ··· . . . . . . . . . ··· . . . 0 0 ··· σ2 q
is the covariance matrix of the code vectors.
1
3. (ProgrammingAssignment)CompletetheimplementationofPCAandNMFusing Pythonandscikit-learn. Thecompletednotebooksmustbesubmittedtogetherwiththe answers to the questions above. When submitting Jupyter notebooks, make sure to save printed outputs as well.