## Description

1. (15 points) Given a symmetric matrix A ∈ R

3×3

, suppose its eigen-decomposition can be

written as

A =

u11 u12 u13

u21 u22 u23

u31 u32 u33

3 0 0

0 1 0

0 0 −2

u11 u21 u31

u12 u22 u32

u13 u23 u33

. (1)

What is the singular value decomposition of this matrix?

1

2. (25 points) Provide a complete proof of the Ky Fan Theorem given on page 4 of the notes

“Principal Component Analysis and Autoencoders”. The Theorem is also given below:

Theorem. (Ky Fan) Let H ∈ R

n×n be a symmetric matrix with eigenvalues

λ1 ≥ λ2 ≥ · · · ≥ λn,

and the corresponding eigenvectors U = [u1, . . . ,un]. Then

λ1 + · · · λk = max

A∈Rn×k:AT A=Ik

trace

ATHA

.

And the optimal A∗

is given by A∗ = [u1, . . . ,uk]Q with Q an arbitrary orthogonal matrix.

3. (40 points) (Coding Task) Kernel vs Neural Network: In this assignment, you will apply

two kernel models and a neural network model to the binary classification task on a partial

dataset from MNIST. In this classification task, the model will take a 16 × 16 image of

handwritten digits as inputs and classify the image into two classes. Please check the starting

code in folder “kernel/code” and follow the instructions. The “data” folder contains the

dataset, which has already been split into a training set and a test set. All data examples

are saved in dictionary-like objects using “npz” file. For each data sample, the dictionary

key “x” indicates its raw features, which are represented by a 256-dimensional vector where

the values between [−1, 1] indicate grayscale pixel values for a 16 × 16 image. In addition,

the key “y” is the label for a data example, which can be 0, 1 or 2. In this homework, we

only use the samples with the label 1 or 2. Note that you will need to use PyTorch in this

assignment. Please read the “Readme” file carefully before getting started. You are

expected to implement the solutions based on the starting code. The files you need to modify

are “network.py” and “main.py”. You will test your solution by modifying and running the

“main.py” file.

(a) (10 points) Implement and test kernel logistic regression model — complete the forward() function of the class Kernel Layer() and the init () function of the class

Kernel LR(), test your implementation in “main.py”.

(b) (15 points) Implement and test radial basis function network model — complete the

k means() function of the class Kernel Layer() and the init () function of the

class RBF(), test your implementation in “main.py”.

(c) (15 points) Implement and test feed forward neural network model — complete the

init () function of the class FFN(), test your implementation in “main.py”.

Tune the hyper-parameters to achieve best classification performance for three models in

“main.py”. Report your hyper-parameters and test accuracy of three models, then compare

their performance. Note that for the comparison, you need to use the same hidden size for

radial basis function network model and feed forward network model.

4. (70 points) (Coding Task) PCA vs Autoencoder: In this assignment, you will apply the

PCA and the autoencoder (AE) to a collection of handwritten digit images from the USPS

dataset. The data file is stored in the “PCA/data” folder as “USPS.mat”. Please check

the starting code in folder “PCA/code” and follow the instructions. The whole dataset is

already loaded and stored in the matrix A with shape 3000 × 256. Each row of matrix A

represents a 16 × 16 handwritten digit image (between 0 and 9), which is flattened to a 256-

dimensional vector. Note that you will need to use PyTorch in this assignment. Please read

the “Readme” file carefully before getting started. You are expected to implement

the solutions based on the starting code. The files you need to modify are “solution.py” and

“main.py”. You will test your solution by modifying and running the “main.py” file.

(a) (10 points) In the class PCA(), complete the do pca() function.

(b) (5 points) In the class PCA(), complete the reconstruction() function to perform

data reconstruction. Please evaluate your code by testing different numbers of the principal component that p = 32, 64, 128.

(c) (10 points) In the class AE(), complete the network() and forward() function.

Please follow the note (http://people.tamu.edu/~sji/classes/PCA.pdf) to implement your network. Note that for problems (c), (e), and (f), the weights need to be

3

shared between the encoder and the decoder with weight matrices transposed to each

other.

(d) (5 points) In the class AE(), complete the reconstruction() function to perform data

reconstruction. Please test your function using three different dimensions for the hidden

representation d that d = 32, 64, 128.

(e) (10 points) Compare the reconstruction errors from PCA and AE. Note that you need to

set p = d for comparisons. Please evaluate the errors using p = d = 32, 64, 128. Report

the reconstruction errors and provide a brief analysis.

(f) (10 points) Experimentally justify the relations between the projection matrix G in

PCA and the optimized weight matrix W in AE. Note that you need to set p = d

for valid comparisons. Please explore three different cases that p = d = 32, 64, 128.

We recommend to first use frobeniu norm error() to verify if W and G are the

same. If not, please follow the note (http://people.tamu.edu/~sji/classes/PCA.

pdf) to implement necessary transformations for two matrices G and W and explore

the relations. You need to modify the code in “main.py”.

(g) (10 points) Please modify the network() and forward() function so that the weights

are not shared between the encoder and the decoder. Report the reconstructions errors

for d = 32, 64, 128. Please compare with the sharing weights case and briefly analyze

you results.

(h) (10 points) Please modify the network() and forward() function to include more

network layers and nonlinear functions. Please set d = 64 and explore different hyperparameters. Report the hyperparameters of the best model and its reconstruction error.

Please analyze and report your conclusions.

5. (30 points) Bonus question (You may earn extra points if you provide correct answers.)

Let us consider a binary class classification problem. In view of the kernel methods we

described in class, many methods (logistic regression, support vector machines) can be written

in either the primal form or the dual form. For example, in the case of logistic regression, the

primal formulation (in Eqn. (6) of notes) is the one that we described at the very beginning of

this class. That is, we use the original data and class label {xi

, yi}

n

i=1 for training, and the test

data {xi}

n+m

i=n+1 for prediction. The dual formulation (in Eqn. (13) of notes) is the one that

uses kernel values (instead of the original data) described in the context of kernel methods

in class. Specifically, the dual formulation uses the kernel values k(xi

, xj ), i, j = 1, 2, · · · , n

and class labels {yi}

n

i=1 for training, and the kernel values k(xi

, xj ), i = 1, 2, · · · , n, j =

n + 1, n + 2, · · · , n + m for prediction.

Suppose that we only have a solver that can only solve logistic regression in the primal

form with `2 norm regularization. On the other hand, we only have kernel values k(xi

, xj ),

i = 1, 2, · · · , n, j = 1, 2, · · · , n, n + 1, · · · , n + m and class labels {yi}

n

i=1. Note the original

training and test data {xi}

n+m

i=1 are NOT available. Please describe how we can train a logistic

regression classifier and perform testing using only the solver and data given above.

4