## Description

1. (20 points) Let X = {x1, . . . , xn} be a set of n samples drawn i.i.d. from an univariate

distribution with density function p(x|θ), where θ is an unknown parameter. In general, θ

will belong to a specified subset of R, the set of real numbers. For the following choices of

p(x|θ), derive the maxmimum likelihood estimate of θ based on the samples X :

1

(a) (5 points) p(x|θ) = √

1

2πθ exp

−

x

2

2θ

2

, θ > 0.

(b) (5 points) p(x|θ) = 1

θ

exp

−

x

θ

, 0 ≤ x < ∞, θ > 0.

(c) (5 points) p(x|θ) = θxθ−1

, 0 ≤ x ≤ 1, 0 < θ < ∞.

(d) (5 points) p(x|θ) = 1

θ

, 0 ≤ x ≤ θ, θ > 0.

2. (20 points) Let X = {x1, . . . , xn}, xi ∈ R

d be a set of n samples drawn i.i.d. from a multivariate

Gaussian distribution in R

d with mean µ ∈ R

d and covariance matrix Σ ∈ R

d×d

. Recall that

the density function of a multivariate Gaussian distribution is given by:

p(x|µ, Σ) = 1

(2π)

d/2|Σ|

1/2

exp

−

1

2

(x − µ)

T Σ

−1

(x − µ)

.

(a) (10 points) Derive the maximum likelihood estimates for the mean µ and covariance Σ

based on the sample set X .

1,2

(b) (5 points) Let ˆµn be the maximum likelihood estimate of the mean. Is ˆµn a biased

estimate of the true mean µ? Clearly justify your answer by computing E[ˆµn].

(c) (5 points) Let Σˆ

n be the maximum likelihood estimate of the covariance matrix. Is Σˆ

n

a biased estimate of the true covariance Σ? Clearly justify your answer by computing

E[Σˆ

n].

3. (10 points) Table 1 specifies the misclassification costs for a 3-class problem including a

‘Reject’ option. Assume that a model has been trained using training data, and the model

can output posterior probabilities P(C1|xtest), P(C2|xtest), P(C3|xtest) for any given test point

xtest.

(a) (5 points) Assume λ = 10. For a given xtest, let the posterior probabilities for the three

classes be: P(C1|xtest) = 0.5, P(C2|xtest) = 0.25, P(C3|xtest) = 0.25. Using Table 1,

compute the risks for predicting x to be C1, C2, C3, and ‘Reject’ respectively. Including

‘Reject’ as a possible option, what would your predicted class for xtest be? You have to

show details of your computation and justify your answer.

1You have to show the details of your derivation. A correct answer without the details will not get any credit.

2You can use material from the Matrix Cookbook and/or the textbook for your derivation.

Predicted Class

C1 C2 C3 ‘Reject’

True Class

C1 0 1 1 λ

C2 10 0 10 λ

C3 100 100 0 λ

Table 1: Misclassification costs for a 3-class problem including a ‘Reject’ option.

(b) (5 points) Assume λ = 5. For a given xtest, let the posterior probabilities for the

three classes be: P(C1|xtest) = 0.4, P(C2|xtest) = 0.5, P(C3|xtest) = 0.1. Using Table 1,

compute the risks for predicting x to be C1, C2, C3, and ‘Reject’ respectively. Including

‘Reject’ as a possible option, what would your predicted class for xtest be? You have to

show details of your computation and justify your answer.

Programming assignment:

The next problem involves programming. For Question 3, we will be using the 2-class classification datasets from Boston50, Boston75, and the 10-class classification dataset from Digits which

were used in Homework 1.

3. (50 points) We will develop two parametric classifiers by modeling each class’s conditional

distribution p(x|Ci) as multivariate Gaussians with (a) full covariance matrix Σi and (b)

diagonal covariance matrix Σi

. In particular, using the training data, we will compute the

maximum likelihood estimate of the class prior probabilities p(Ci) and the class conditional

probabilities p(x|Ci) based on the maximum likelihood estimates of the mean ˆµi and the

(full/diagonal) covariance Σˆ

i for each class Ci

. The classification will be done based on the

following discriminant function:

gi(x) = log p(Ci) + log p(x|Ci) .

We will develop code for a class MultiGaussClassify with two key functions:

MultiGaussClassify.fit(self,X,y,diag) and MultiGaussClassify.predict(self,X).

For fit(self,X,y,diag), the inputs (X, y) are respectively the feature matrix and class labels, and diag is boolean (TRUE or FALSE) which indicates whether the estimated class covariance matrices should be a full matrix (diag=FALSE) or a diagonal matrix (diag=TRUE).

For predict(X), the input X is the feature matrix corresponding to the test set and the

output should be the predicted labels for each point in the test set.

For the class, the init (self,k,d) function can initialize the parameters for each class to

be uniform prior, zero mean, and identity covariance, i.e., p(Ci) = 1/k, µi = 0 and Σi = I,

i = 1, . . . , k. Here, the number of classes k and the dimensionality d of features is passed as

an argument to the constructor of MultiGaussClassify.

We will compare the performance of three models:

(i) MultiGaussClassify with full class covariance matrices,

(ii) MultiGaussClassify with diagonal covariance matrices, and

2

(iii) LogisticRegression3

applied to three datasets: Boston50, Boston75, and Digits. Using my cross val with 5-fold

cross-validation, report the error rates in each fold as well as the mean and standard deviation

of error rates across folds for the three models applied to the three classification datasets

You will have to submit (a) code and (b) summary of results:

(a) Code: You will have to submit code for MultiGaussClassify as well as a wrapper code

hw2q3(). For the class, please use the following template:

class MultiGaussClassify:

def init (self, k, d):

…

def fit(self, X, y, diag=False):

…

def predict(self, X):

…

Your class MultiGaussClassify should not inherit any base class in sklearn. Again,

the three functions you must implement in the MultiGaussClassify class are init ,

fit, and predict.

The wrapper code hw2q3() (main file) has no input and is used to prepare the datasets,

and make calls to my cross val(method,X,y,k) to generate the error rate results for each

dataset and each method. The code for my cross val(method,X,y,k) must be yours

(e.g., code you developed in HW1 with modifications as needed) and you cannot use

cross val score() in sklearn. For the method argument in my cross val, you can

call the method corresponding to MultiGaussClassify with full covariance matrix as

just ‘multigaussclassify’ and the method corresponding to MultiGaussClassify with

diagonal covariance matrix as ‘multigaussdiagclassify.’

The results should be printed to terminal (not generating an additional file in the folder).

Make sure the calls to my cross val(method,X,y,k) are made in the following order and

add a print to the terminal before each call to show which method and dataset is being

used:

1. MultiGaussClassify with full covariance matrix on Boston50,

2. MultiGaussClassify with full covariance matrix on Boston75,

3. MultiGaussClassify with full covariance matrix on Digits,

4. MultiGaussClassify with diagonal covariance matrix on Boston50,

5. MultiGaussClassify with diagonal covariance matrix on Boston75,

6. MultiGaussClassify with diagonal covariance matrix on Digits,

7. LogisticRegression with Boston50,

8. LogisticRegression with Boston75, and

9. LogisticRegression with Digits.

3You should use LogisticRegression from scikit-learn, similar to HW1.

3

For example, the first call to my cross val(method,X,y,k) should result in the following

output:

Error rates for MultiGaussClassify with full covariance matrix on Boston50:

Fold 1: ###

Fold 2: ###

…

Fold 5: ###

Mean: ###

Standard Deviation: ###

(b) Summary of results: For each dataset and each method, report the test set error rates

for each of the k = 5 folds, the mean error rate over the k folds, and the standard deviation

of the error rates over the k folds. Make a table to present the results for each method

and each dataset (9 tables in total). Each column of the table represents a fold, and add

two columns at the end to show the overall mean error rate and standard deviation over

the k folds. For example:

Error rates for MGC with full cov matrix on Boston50

Fold 1 Fold 2 Fold 3 Fold 4 Fold 5 Mean SD

# # # # # # #

Additional instructions: Code can only be written in Python (not IPython notebook); no other

programming languages will be accepted. One should be able to execute all programs directly

from command prompt (e.g., “python3 hw2q3.py”) without the need to run Python interactive

shell first. Test your code yourself before submission and suppress any warning messages that may

be printed. Your code must be run on a CSE lab machine (e.g., csel-kh1260-01.cselabs.umn.edu).

Please make sure you specify the version of Python you are using as well as instructions on how to

run your program in the README file (must be readable through a text editor such as Notepad).

Information on the size of the datasets, including number of data points and dimensionality of

features, as well as number of classes can be readily extracted from the datasets in scikit-learn.

Each function must take the inputs in the order specified in the problem and display the output

via the terminal or as specified.

For each part, you can submit additional files/functions (as needed) which will be used by the

main file. Please put comments in your code so that one can follow the key parts and steps in your

code.

Follow the rules strictly. If we cannot run your code, you will not get any credit.

• Things to submit

1. hw2.pdf: A document which contains the solution to Problems 1, 2, and 3 including the

summary of results for 3. This document must be in PDF format (no word, photo, etc.

is accepted). If you submit a scanned copy of a hand-written document, make sure the

copy is clearly readable, otherwise no credit may be given.

2. Python code for Problem 3 (must include the required hw2q3.py).

3. README.txt: README file that contains your name, student ID, email, instructions

on how to run your code, the full Python version your are using, any assumptions you

are making, and any other necessary details. The file must be readable by a text editor

such as Notepad.

4

4. Any other files, except the data, which are necessary for your code.

5