Assignment #2: COMP4434 Big Data Analytics solution


Original Work ?


5/5 - (1 vote)

Question 1 [10 marks]

(a). [5 point] Consider using linear regression for binary classification on the label {0, 1}.

Here, we use a linear model
(π‘₯) = πœƒ1π‘₯ + πœƒ0
and squared error loss 𝐿 =
(π‘₯) βˆ’ 𝑦)

. The threshold of the prediction is set as
0.5, which means the prediction result is 1 if β„Žπœƒ
(π‘₯) β‰₯ 0.5 and 0 if β„Žπœƒ
(π‘₯) < 0.5.

However, this loss has the problem that it penalizes confident correct predictions, i.e.,
(π‘₯) is larger than 1 or less than 0. Some students try to fix this problem by using an
absolute error loss 𝐿 = |β„Žπœƒ
(π‘₯) βˆ’ 𝑦|.

The question is: Will it fix the problem? Please
answer the question and explain it. Furthermore, some other students try designing
another loss function as follows
𝐿 = {
max(0, β„Žπœƒ
(π‘₯)), 𝑦 = 0
β‹― , 𝑦 = 1

Although it is not complete yet, if it is correct in principle, please complete it and explain
how it can fix the problem. Otherwise, please explain the reason.

(b). [5 point] Consider the logistic regression model β„Žπœƒ
(π‘₯) = 𝑔(πœƒ

𝑇π‘₯), trained using the
binary cross entropy loss function, where 𝑔(𝑧) =
is the sigmoid function.

students try modifying the original sigmoid function into the following one
𝑔(𝑧) =

The model would still be trained using the binary cross entropy loss. How would the
model prediction rule, as well as the learnt model parameters πœƒ , differ from
conventional logistic regression? Please show your answer and explanation.

Question 2 [20 marks]

Consider using logistic regression for classification problems. Four 3-dimensional data
points (π‘₯1, π‘₯2, π‘₯3
and the corresponding labels 𝑦
are given as follows.

Data point π‘₯1 π‘₯2 π‘₯3 y
D1 -0.120 0.300 -0.010 1
D2 0.200 -0.030 -0.350 -1
D3 -0.370 0.250 0.070 -1
D4 -0.100 0.140 -0.520 1

The learning rate πœ‚ is set as 0.2 and the initial parameter πœƒ[0] is set as [-0.09, 0, -0.19, –
0.21]. Please answer the following questions.

a) [5 point] Calculate the initial predicted label for each data point.

b) [10 point] Calculate the parameter in the first and second iterations, i.e., πœƒ[1], πœƒ[2], by
using gradient descent algorithm.

c) [5 point] Implement the gradient descent algorithm to update the parameters πœƒ using
python language. Please show the change trend diagram of loss function 𝐽(πœƒ) in 50000
rounds and upload the source code file.

ps. For a) and b), the detailed calculation process is required and the intermediate and final
results should be rounded to 3 decimal places.