Description

5/5 - (2 votes)

1 Ridge Regression
1.1 Question 1

In class, you learned about using k-folds cross validation as a way to estimate the true error of a learning algorithm and to tune parameters. The preferred solution is Score- One- Out Cross Validation (LOOCV), which provides an almost unbiased estimate of this true error, but it can take a really long time to compute. In this problem, you will derive an algorithm for efficiently computing the LOOCV error for a particular regression algorithm.

We will now introduce a simple extension of the Least Square algorithm. Given a set of m data points

and associated labels (z , p; z; e Rd , y; R],

term h to optimize the following:

’ . Ridge Regression finds the weight vector w and a bias

(1)

Note that with A = 0 Ridge Regression is exactly the Least Square formulation.
• Using the hJatlab notation, letI tu; h) , A A; 1 ], I /, 0; 0*, 0], C = X X + SI, and d —— X y.

Show that the solution of Ridge regression is:

We know Ridge regression finds weight vector as

min x 2 * (x’i,* b — q,)2

We also know w w; h] , A A; 1*] , /, 0; 0 , 0] , = AN 3/, and d —— X y

Therefore, 1 2 2 2 and we get ridge expression in vector form as follow:

But we know J*A A I, thus we have:

Using property of the norm we have:

As ridge expression is convex function of in, so it suffices to find in where gradient is zero. Therefore, on taking first moment w.r.t. w, we get:

Thus, we have:

N C —— d

Hence to C d proved

• Now suppose we remove z, from the training data, let U(,) , d( ) , F ( ) be the corresponding matrices for removing z, . Express C(,) in terms of and z, . Express d ( ) in terms of d and z;.

We know U = AA* + A/, But AA represents summation of dot product of z(,) .z ) for e ery feature

combination where feature is represented by row number and column number. JJ adds a constant to

the overall value to particular feature and doesn’t depend on samples.

Thus, we can say that,

Similarly we know that d —— X p. Thus, d is summation of z.p for all samples. So by removing of sample, we just don’t add the z.p for the particular sample for that particular feature. Therefore, we can say that:

d(,) = d— x( , ) .9(,)

• Express in terms of 1 and m, . Hint: use the Sherman—hlorrison for mula

(A ue”) —’ — A—’

From 1)a and 1)b, we have,

Thank you for using www.freepdfconvert.com service!

Only two pages are converted. Please Sign Up to convert all pages. https://www.freepdfconvert.com/membership

CSE 512 Homework 2 solution

Download Details:

Description

CSE 512 Homework 2 solution

Download Details:

Description

Related products

CSE 512 Assignment 2 solution

CSE 512 Homework 3 solution

CSE 512 Homework 2 SVM Face Detection solution