CS5590 Assignment 4 Foundations of Machine Learning solution

$29.99

Original Work ?

Download Details:

  • Name: Assignment-4-hccyzw.zip
  • Type: zip
  • Size: 1.20 MB

Category: You will Instantly receive a download link upon Payment||Click Original Work Button for Custom work

Description

5/5 - (7 votes)

Questions: Theory
1. Non-Uniform Weights in Linear Regression: (6 marks) You are given a dataset in
which the data points are denoted by (xn, tn), n = 1, · · · , N. Each data point is associated
with a non-negative weighting factor gn > 0. The error function is thus modified to:
ED(w) = 1
2
X
N
n=1
gn

tn − w
TΦ(xn)
2
where Φ(·) is any representation of the data.
1
(a) (3 marks) Find an expression for the solution w∗
that minimizes the above error
function.
(b) (3 marks) Give two alternative interpretations of the above weighted sum-of-squares
error function in terms of: (i) data-dependent noise variance and (ii) replicated data
points.
2. Bayes Optimal Classifier: (2 marks) Let there be 5 hypotheses h1 through h5 that
could guide a robot to move either Forward(F) or Left(L) or Right(R):
P(hi
|D) P(F|hi) P(L|hi) P(R|hi)
0.4 1 0 0
0.2 0 1 0
0.1 0 0 1
0.1 0 1 0
0.2 0 1 0
Compute the MAP estimate and Bayes optimal estimate using the data provided in the
table. Are they the same? Justify your answer.
3. VC-Dimension: (2 marks) Consider a data setup of one-dimensional data ∈ R
1
, where
the hypothesis space H is parametrized by {p, q} where x is classified as 1 iff p < x < q. Find the VC-dimension of H. 4. Regularizer: (4 marks) Given D-dimensional data x = [x1, x2, · · · , xD], consider a linear model of the form: y(x, w) = w0 + X D k=1 wkxk Now, for N such data samples with their corresponding labels (xi , ti), i = 1, 2, · · · , N, the sum-of-squares error (or mean-squared-error) function is given by: E(w) = 1 2 X N i=1  y(xi , w) − ti 2 Now, suppose that Gaussian noise k ∼ N (0, σ2 ) (i.e. zero mean and variance σ 2 ) is added independently to each of the input variables xk. Find a relation between: minimizing the above sum-of-squares error averaged over the noisy data, and minimizing the standard sumof-squares error (averaged over noise-free input data) with a L2 weight-decay regularization term, in which the bias parameter w0 is omitted from the regularizer. Questions: Programming 5. Logistic Regression: (7 marks) (a) (3 marks) Implement your own code for a logistic regression classifier, which is trained using gradient descent and cross-entropy error as the error function. 2 Index x1 x2 y 1 0.346 0.780 0 2 0.303 0.439 0 3 0.358 0.729 0 4 0.602 0.863 1 5 0.790 0.753 1 6 0.611 0.965 1 Table 1: Train Set Index x1 x2 y 1 0.959 0.382 0 2 0.750 0.306 0 3 0.395 0.760 0 4 0.823 0.764 1 5 0.761 0.874 1 6 0.844 0.435 1 Table 2: Test Set (b) Consider the training set and test set given in Tables 1 and 2. We use the linear model fθ(x1, x2) = θ0 + θ1x1 + θ2x2 and the logistic regression function σ(fθ(x1, x2)) = 1 1+exp−fθ (x1,x2) . Consider the initial weights as θ0 = −1, θ1 = 1.5, θ2 = 0.5, and learning rate as 0.1 (for gradient descent). i. (1 mark) What is the logistic model P(ˆy = 1|x1, x2) and its cross-entropy error function? ii. (1 mark) Use gradient descent to update θ0, θ1, θ2 for one iteration. Write down the updated logistic regression model. iii. (2 mark) At convergence of gradient descent, use the model to make predictions for all the samples in the test dataset. Calculate and report the accuracy, precision and recall to evaluate this model. Deliverables: • Code • Brief report with answers to above questions. 6. Kaggle - Taxi Fare Prediction: (9 marks) The next task of this assignment is to work on a (completed) Kaggle challenge on taxi fare prediction. As part of this task, please visit https://www.kaggle.com/c/new-york-city-taxi-fare-prediction to know more about this problem, and download the data. (You now know how to download data from Kaggle.) You are allowed to use any machine learning library of your choice: scikitlearn, pandas, Weka (we recommend scikitlearn), and any regression method too. Use train.csv to train your classifier. Predict the fares on the data in test.csv, and report your best 2 scores in your report. (We will also upload your codes randomly to confirm the scores.) Deliverables: • Code • Brief report with top-2 scores of your methods, and a brief description of the methods that resulted in the top 2 scores. • Your report should also include your analysis of why your best 2 methods performed better than others you tried. 3