Description
1 Ridge Regression
1.1 Question 1
In class, you learned about using k-folds cross validation as a way to estimate the true error of a learning algorithm and to tune parameters. The preferred solution is Score- One- Out Cross Validation (LOOCV), which provides an almost unbiased estimate of this true error, but it can take a really long time to compute. In this problem, you will derive an algorithm for efficiently computing the LOOCV error for a particular regression algorithm.
We will now introduce a simple extension of the Least Square algorithm. Given a set of m data points
and associated labels (z , p; z; e Rd , y; R],
term h to optimize the following:
’ . Ridge Regression finds the weight vector w and a bias
(1)
Note that with A = 0 Ridge Regression is exactly the Least Square formulation.
• Using the hJatlab notation, letI tu; h) , A A; 1 ], I /, 0; 0*, 0], C = X X + SI, and d —— X y.
Show that the solution of Ridge regression is:
We know Ridge regression finds weight vector as
min x 2 * (x’i,* b — q,)2
We also know w w; h] , A A; 1*] , /, 0; 0 , 0] , = AN 3/, and d —— X y
Therefore, 1 2 2 2 and we get ridge expression in vector form as follow:
But we know J*A A I, thus we have:
Using property of the norm we have:
As ridge expression is convex function of in, so it suffices to find in where gradient is zero. Therefore, on taking first moment w.r.t. w, we get:
Thus, we have:
N C —— d
Hence to C d proved
• Now suppose we remove z, from the training data, let U(,) , d( ) , F ( ) be the corresponding matrices for removing z, . Express C(,) in terms of and z, . Express d ( ) in terms of d and z;.
We know U = AA* + A/, But AA represents summation of dot product of z(,) .z ) for e ery feature
combination where feature is represented by row number and column number. JJ adds a constant to
the overall value to particular feature and doesn’t depend on samples.
Thus, we can say that,
Similarly we know that d —— X p. Thus, d is summation of z.p for all samples. So by removing of sample, we just don’t add the z.p for the particular sample for that particular feature. Therefore, we can say that:
d(,) = d— x( , ) .9(,)
• Express in terms of 1 and m, . Hint: use the Sherman—hlorrison for mula
(A ue”) —’ — A—’
From 1)a and 1)b, we have,
2
Thank you for using www.freepdfconvert.com service!
Only two pages are converted. Please Sign Up to convert all pages. https://www.freepdfconvert.com/membership