CS 464: Homework Assignment 3 solution

$24.99

Original Work ?
Category: You will Instantly receive a download link for .ZIP solution file upon Payment

Description

5/5 - (3 votes)

1 Multi-class Fisher Discriminant Analysis [20 pts]
In the lecture, you have seen the derivation of Fisher Discriminant Analysis (FDA) equations for
two classes. Derive the multi-class FDA case. Show the steps of your derivation in detail.
2 Clustering [55 pts]
In this question, you will implement k-means and kernel k-means algorithms. You will work with
a toy example to observe the difference of the two algorithms. Download clustering.csv from
Moodle. It contains 200 samples and their labels. Note that clustering is an unsupervised algorithm,
that is we do not have the labels of the samples; however, in this synthetic experiment we provide
the labels in order to assess the difference of the two algorithms.
1
-10 -5 0 5 10
-10
-8
-6
-4
-2
0
2
4
6
8
10
Figure 1: The visualization of the given dataset
a) [25 pts] Implement the k-means algorithm and apply it to the dataset. k-means may get stuck
in local minima, you should repeat each clustering algorithm many times with random restarts and
pick the best. Calculate the confusion matrix using your clustering results and the true labels. Plot
the predicted clusters with two colors similiar to Figure 1 and mark the cluster centroids clearly.
How does k-means perform, discuss the results?
Kernel k-means is kernelized version of the k-means algorithms and similarly aims to assign
samples to the closest centroid such that the total distances of samples to their centroids are
minimized. The difference between the two algorithms is in calculation of distance between samples.
In k-means distance is calculated in the original feature space where as in the kernel k-means using
the kernel trick in the transformed space. The kernel k-means objective is below:
J (D) = X
N
i=1
X
K
k=1
δik · kΦ(xi) − mkk
2
=
X
N
i=1
X
K
k=1
δik ·

Φ(xi)
TΦ(xi) − 2 · Φ(xi)
Tmk + mT
k mk

(1)
where N is the number of samples, K is the number of clusters, D = {x1, x2, . . . , xN } is set of
samples in input space. δik is 1 if i
th sample is in k
th cluster, otherwise 0; Φ is a mapping function
such that Φ : X → V where X is the input space and V is a new feature space; mk is the centroid
of k
th cluster which can be calculated as follows:
mk =
PN
i=1 δik · Φ(xi)
PN
i=1 δik
. (2)
Most of the time, we do not have the explicit mapping function to calculate J (D). This is where

kernel function comes into play. Kernel function is a function such that:
K(xi
, xj ) = hΦ(xi), Φ(xj )i
= Φ(xi)
TΦ(xj )
(3)
where K : X × X → R is the mapping function from the input space to real numbers. K(xi
, xj )
will correspond to (i, j) and (j, i) indexed entries of the kernel matrix. Note that K is a symmetric
function, that is K(xi
, xj ) = K(xj
, xi).
b) [5 pts] As you can see, J (D) has components which can be easily calculated by using the kernel
matrix, the matrix where each entry is the kernel function evaluation between a pair of example.
Rewrite the equation such that instead of Φ, kernel matrix is used in the calculation.
c) [25 pts] Implement the kernel k-means algorithm and apply it to the dataset using polynomial
kernel with degree 2. Here to use random restarts. Give the resulting confusion matrix and calculate
the accuracy. Plot the prediction of polynomial kernel k-means results by coloring the examples
and marking the centroids. How does it perform? Is it better than k-means results? Discuss the
results.
3 Principle Component Analysis [25 pts]
In this question, you are going to apply principal component analysis (PCA) on MNIST digit
dataset. It contains 5000 images of handwritten digits from 0 to 9. You can find digits.csv file on
Moodle. The first 400 columns are the features of the sample and the last column is the label of
the sample, which you should ignore.
a) [15 pts] Calculate the principal components, you may use avaiable functions. Plot the resulting
eigenvalues in descending order. Decide on a number of principal component, state your rationale
of selection of number of principle components.
b) [10 pts] Display the top and bottom 5 principal components. Discuss the results. In Matlab,
you can generate images of the digits using the following code snippet:
load digits.mat
i = 1; %select the digit (row) to display
I = digits( i, : );
figure, imagesc( reshape( I, 20, 20 ) );
colormap( gray );
axis image;
NOTES
• This homework will be graded by your TA, Ali Burak Unal. Please ask the clarifications on ¨
Moodle, for other things you may ask questions at his office hours or by e-mail.
• Do not forget to add a README file that contains execution details for your code.
3