Assignment #3: COMP4434 Big Data Analytics solution

$30.00

Original Work ?
Category: You will Instantly receive a download link for .ZIP solution file upon Payment

Description

5/5 - (1 vote)

Q1. Regularization

[15 points] We use polynomial regression for the prediction task of a dataset. The given dataset
includes a train set (train.csv) and a test set (test.csv).

To illustrate the effect of regularization,
please first implement the following regression models using python language (third-party
packages are allowed).

Then, plot the data points of the train set and the regression lines of the
trained models. Finally, compute the RMSE of the trained models using the test set and make a
comparative discussion about underfitting and overfitting.

• Polynomial regression without regularization (polynomial to 5th power)
• L1 Regularized polynomial regression: 𝜆 = 1 and 𝜆 = 100
• L2 Regularized polynomial regression: 𝜆 = 1 and 𝜆 = 100

The given datasets can be downloaded at:
https://drive.google.com/drive/folders/1LSZNIEWf6XKnQtRw8L01tS6yAB67Aad2?usp=sharing

Q2. Recommender System

Build up a collaborative filtering-based recommender system to provide effective hotel
recommendation.

The training dataset as shown in the table below contains the ratings from 4 users
to 3 hotels. The ratings range from 1 point to 5 points.

Hotel 1 Hotel 2 Hotel 3
User 1 5 1 ?
User 2 4 ? 3
User 3 ? 4 5
User 4 3 3 4

We use the gradient descent algorithm to solve cost minimization in the collaborative filtering
model. Some settings are as follows.

• The constant learning rate 𝛼 = 0.0002
• The regularization parameter 𝜆 = 0.02
• The dimension for user/item feature vectors 𝐾 = 2
• The initial values for parameters 𝑥 = [
0.77 0.43 0.31
0.48 0.44 0.51] and 𝜃
𝑇 = [
0.19 0.62
0.68 0.78
0.18 0.08
0.36 0.92
]

a) [5 points] If we finally obtain 𝑥
(1) = [1.268 0.994]
𝑇
and 𝜃
(3) = [0.271 0.694]
𝑇
after the
training procedure, what is the rating of user 3 on hotel 1?

b) [10 points] Calculate the values of 𝑥1
(1)
(i.e., the first element in the item feature vector of
hotel 1) and 𝜃1
(2)
(i.e., the first element in the user feature vector of user 2) after the first
iteration.

c) [5 point] Implement the gradient descent algorithm to update the parameters 𝑥 and 𝜃 using
python language. Please calculate the ratings of user 2 on hotel 2 after 50 rounds and upload
the source code file.

ps. For a) and b), the detailed calculation process is required and the intermediate and final
results should be rounded to 3 decimal places.

Q3. Neural Network

[10 points] Consider the following neural network:
Where 𝑎𝑖 = ∑ 𝑤𝑗
𝑖
𝑗 𝑧𝑗
𝑧𝑖 = 𝑓𝑖
(𝑎𝑖
) for 𝑖 = 1,2,3, 4 𝑧0 = 𝑎0
(an input neuron) 𝑓3
(𝑥) = relu(𝑥)
and 𝑓1
(𝑥) = 𝑓2
(𝑥) = 𝑓4
(𝑥) = sigmoid(𝑥). relu(𝑥) corresponds to a rectifier linear unit transfer
function defined as: relu(𝑥) = max {0,𝑥}.

The cost function is defined as 𝐽(𝑤) =
1
2
(𝑧4 − 𝑦)
2
.

(a) Write a function 𝐹 to simulate the neural network.

(b) Assume that we are given a training data 𝑥 = 1.0, 𝑦 = 0.1 what is the value of 𝜕𝐽
𝜕𝑤3
4?