## Description

1. The Matlab data file 0 1 2.mat 1

contains 300 handwritten 0’s, 1’s and 2’s images (one

hundred each) scanned from postal envelopes, like the ones shown below.

Figure 1:

These images are stored as a 64 × 300 matrix. Each column of the matrix is an 8 × 8

greyscale image (the pixel intensities are between 0 and 1). Figure 2 illustrates the two

most significant dimensions found by PCA.

(a) Reproduce Figure 2. You need to find the low-dimensional mapping YP CA by

PCA, and then call the function plotimages provided in the course web page.

You need to implement PCA yourself. You may use svd , eig, and eigs functions

in Matlab but you cannot use Matlab built-in functions pca, and princomp.

1

.mat file can be imported in Python with scipy:

https://docs.scipy.org/doc/scipy/reference/generated/scipy.io.loadmat.html

1

−2 −1.5 −1 −0.5 0 0.5 1 1.5 2 2.5

−3

−2.5

−2

−1.5

−1

−0.5

0

0.5

1

1.5

x

y

Figure 2: A canonical dimensionality reduction problem from visual perception. The input

consists of a sequence of 64-dimensional vectors, representing the brightness values of 8 pixel

by 8 pixel images of digits 0, 1 and 2. Applied to n = 300 raw images. A two-dimensional

projection is shown, with the original input images.

(b) Produce a figure similar to Figure 2, but this time use FDA to map the data in

2-dimensional space.

(c) Use the two-dimensional data YP CA as the covariate. Use LDA and QDA to

compute a linear and a quadratic decision boundary and report the analytic form

of the boundaries.

(d) Plot the decision boundaries on the figure that you produced in part (a)

(e) Implement LDA as explained in class (Based on the Euclidian distance between

points and the mean of each class). Report the error rates when LDA is applied

to this data set.

(f) Can we implement QDA based on the Euclidian distance between points and the

mean of each class? If your answer is yes explain how. Does this identical to the

QDA that you computed in c

2. Prove that if X | Y = 0 ∼ N(µ0, Σ0) and X | Y = 1 ∼ N(µ1, Σ1), then the Bayes rule

is

2

h

∗

(x) = (

1 if r2

1 < r2

0 + 2 log

π1

π0

+ log

|Σ0|

|Σ1|

0 otherwise

where

r

2

i = (x − µi)

TΣ

−1

i

(x − µi), i = 0, 1

and |A| denotes the determinant of a matrix A.

3. Consider a classifier with class conditional densities of the form N(x|µc,

P

c

). In LDA,

we assume P

c =

P and in QDA, each P

c

is arbitrary. Here we consider the 2 class

case in which P

1 = k

P

2

, for k > 1. That is, the Gaussian ellipsoids have the same

shape, but the one for class 1 is wider. Derive an expression for the decision boundary.

Only for Grad Students

4. Suppose we have features x ∈ Rd

, a two-class response, with class sizes n1, n2, and the

target coded as −n/n1, n/n2.

a) Show that the LDA rule classifies to class 2 if

x

TΣˆ −1

( ˆµ2 − µˆ1) >

1

2

µˆ2

TΣˆ −1µˆ2 −

1

2

µˆ1

TΣˆ −1µˆ1 + log(

n1

n

) − log(

n2

n

),

and class 1 otherwise.

b) Consider minimization of the least squares criterion

Xn

i=1

(yi − β0 − β

T xi)

2

.

show that the solution βˆ satisfies

[(n − 2)Σ +ˆ

n1n2

n

Σˆ

B]β = n( ˆµ2 − µˆ1)

where Σˆ

B = ( ˆµ2 − µˆ1)( ˆµ2 − µˆ1)

T

.

c) Show that Σˆ

Bβ is in the direction ( ˆµ2 − µˆ1) and thus

βˆ ∝ Σˆ −1

( ˆµ2 − µˆ1).

Therefore the least squares regression coefficient is identical to the LDA coefficient,

up to a scalar multiple.

3

5. The true error rate of a classifier h is

L(h) = P({h(X) 6= Y })

Consider the special case where Y ∈ Y = {0, 1}. Let

r(x) = P(Y = 1 | X = x)

In this case the Bayes classification rule h

∗

is

h

∗

(x) =

1 if r(x) >

1

2

0 otherwise.

Prove that the Bayes classification rule is optimal, that is, if h is any other classification

rule then L(h

∗

) ≤ L(h)

4