## Description

Linear Models for Handwritten Digits Classification: In this assignment, you will implement the binary logistic regression model and multi-class logistic regression model on a partial

dataset from MNIST. In this classification task, the model will take a 16 ×16 image of handwritten

digits as inputs and classify the image into different classes. For the binary case, the classes are 1

and 2 while for the multi-class case, the classes are 0, 1, and 2. The “data” fold contains the dataset

which has already been split into a training set and a testing set. All data examples are saved in

dictionary-like objects using “npz” file. For each data sample, the dictionary key ‘x’ indicates its

raw features, which are represented by a 256-dimensional vector where the values between [−1, 1]

indicate grayscale pixel values for a 16 × 16 image. In addition, the key ’y’ is the label for a data

example, which can be 0, 1, or 2. The “code” fold provides the starting code. You must implement

the models using the starting code.

1. Data Preprocessing [15 points]: In this problem, you need to finish “code/DataReader.py”.

(a) Explain what the function train valid split does and why we need this step.

(b) Before testing, is it correct to re-train the model on the whole training set? Explain

your answer.

(c) In this assignment, we use two hand-crafted features:

The first feature is a measure of symmetry. For a 16 × 16 image x, it is defined as

Fsymmetry = −

P

pixel |x − flip(x)|

256

,

where 256 is the number of pixels and flip(·) means left and right flipping.

1

The second feature is a measure of intensity. For a 16 × 16 image x, it is defined as

Fintensity =

P

pixel x

256

,

which is simply the average of pixel values.

Implement them in the function prepare X.

(d) In the function prepare X, there is a third feature which is always 1. Explain why we

need it.

(e) The function prepare y is already finished. Note that the returned indices stores the

indices for data from class 1 and 2. Only use these two classes for binary classification

and convert the labels to +1 and -1 if necessary.

(f) Test your code in “code/main.py” and visualize the training data from class 1 and 2 by

implementing the function visualize features. The visualization should not include the

third feature. Therefore it is a 2-D scatter plot. Include the figure in your submission.

2. Cross-entropy loss [20 points]: In logistic regression, we use the cross-entropy loss.

(a) Write the loss function E(w) for one training data sample (x, y). Note that the binary

labels are 1 and −1.

(b) Compute the gradient ∇E(w). Please provide intermediate steps of derivation.

(c) Once the optimal w is obtained, it can be used to make predictions as follows:

Predicted class of x =

(

1 if θ(w

T x) ≥ 0.5

−1 if θ(w

T x) < 0.5

where the function θ(z) = 1

1+e−z looks like

However, this is not the most efficient way since the decision boundary is linear. Why?

Expalin it. When will we need to use the sigmoid function in prediction?

(d) Is the decision boundary still linear if the prediction rule is changed to the following?

Justify briefly.

Predicted label of x =

(

1 if θ(w

T x) ≥ 0.9

−1 if θ(w

T x) < 0.9

(e) In light of your answers to the above two questions, what is the essential property of

logistic regression that results in the linear decision boundary?

3. Sigmoid logistic regression [25 points]: In this problem, you need to finish “code/LogisticRegression.py”.

Please follow the instructions in the starting code. Please use data from class 1

and 2 for the binary classification.

2

(a) Based on (b) in the last problem, implement the function gradient.

(b) There are different ways to train a logistic regression model. In this assignment, you need

to implement gradient descent, stochastic gradient descent and batch gradient descent

in the functions f it GD, f it SGD and f it BGD, respectively. Note that GD and SDG

are actually special cases of BGD.

(c) Implement the functions predict and score for prediction and evaluation, respectively.

Additionally, please implement the function predict proba which outputs the probabilities of both classes.

(d) Test your code in “code/main.py” and visualize the results after training by using the

function visualize results. Include the figure in your submission.

(e) Implement the testing process and report the test accuracy of your best logistic regression

model.

4. Softmax logistic regression [20 points]: In this problem, you need to finish “code/LRM.py”.

Please follow the instructions in the starting code.

(a) Based on the course notes, implement the function gradient.

(b) In this assignment, you only need to implement batch gradient descent in the function

f it BGD.

(c) Implement the functions predict and score for prediction and evaluation, respectively.

(d) Test your code in “code/main.py” and visualize the results after training by using the

function visualize results multi. Include the figure in your submission.

(e) Implement the testing process and report the test accuracy of your best logistic regression

model.

5. Softmax logistic vs Sigmoid logistic [20 points]: In this problem, you need to experimentally

compare these two methods. Please follow the instructions in the starting code. Use

data examples from class 1 and 2 for classification.

(a) Train the softmax logistic classifier and the sigmoid logistic classifier using the same

data until convergence. Compare these two classifiers and report your observations and

insights.

(b) Explore the training of these two classifiers and monitor the graidents/weights. How can

we set the learning rates so that w1 − w2 = w holds for all training steps?

3