## Description

## Question 1 [10 marks]

(a). [5 point] Consider using linear regression for binary classification on the label {0, 1}.

Here, we use a linear model

βπ

(π₯) = π1π₯ + π0

and squared error loss πΏ =

1

2

(βπ

(π₯) β π¦)

2

. The threshold of the prediction is set as

0.5, which means the prediction result is 1 if βπ

(π₯) β₯ 0.5 and 0 if βπ

(π₯) < 0.5.

However, this loss has the problem that it penalizes confident correct predictions, i.e.,

βπ

(π₯) is larger than 1 or less than 0. Some students try to fix this problem by using an

absolute error loss πΏ = |βπ

(π₯) β π¦|.

The question is: Will it fix the problem? Please

answer the question and explain it. Furthermore, some other students try designing

another loss function as follows

πΏ = {

max(0, βπ

(π₯)), π¦ = 0

β― , π¦ = 1

.

Although it is not complete yet, if it is correct in principle, please complete it and explain

how it can fix the problem. Otherwise, please explain the reason.

(b). [5 point] Consider the logistic regression model βπ

(π₯) = π(π

ππ₯), trained using the

binary cross entropy loss function, where π(π§) =

1

1+πβπ§

is the sigmoid function.

Some

students try modifying the original sigmoid function into the following one

π(π§) =

π

βπ§

1+πβπ§

.

The model would still be trained using the binary cross entropy loss. How would the

model prediction rule, as well as the learnt model parameters π , differ from

conventional logistic regression? Please show your answer and explanation.

2

## Question 2 [20 marks]

Consider using logistic regression for classification problems. Four 3-dimensional data

points (π₯1, π₯2, π₯3

)

π

and the corresponding labels π¦

i

are given as follows.

Data point π₯1 π₯2 π₯3 y

D1 -0.120 0.300 -0.010 1

D2 0.200 -0.030 -0.350 -1

D3 -0.370 0.250 0.070 -1

D4 -0.100 0.140 -0.520 1

The learning rate π is set as 0.2 and the initial parameter π[0] is set as [-0.09, 0, -0.19, –

0.21]. Please answer the following questions.

a) [5 point] Calculate the initial predicted label for each data point.

b) [10 point] Calculate the parameter in the first and second iterations, i.e., π[1], π[2], by

using gradient descent algorithm.

c) [5 point] Implement the gradient descent algorithm to update the parameters π using

python language. Please show the change trend diagram of loss function π½(π) in 50000

rounds and upload the source code file.

ps. For a) and b), the detailed calculation process is required and the intermediate and final

results should be rounded to 3 decimal places.