Description
Question 1 (ridge regression) [5 points]
On page 27 of the lecture note Penalized, we introduced a formula for calculating the degrees of freedom of
a ridge regression. From the previous kNN model, we also know that the degree of freedom is N/k. Use the
Boston housing data again for this analysis.
Let’s use a cross-validation approach to evaluate the performance
of the ridge regression and kNN for a grid of degrees of freedoms. To do this, first, choose a sequence k values
such that the degrees of freedom is within the range of 1 to 14.
The choose the corresponding λ values in
the ridge regression such that the degrees of freedom of these two models are matched. Then use a 10-fold
cross-validation to evaluate and compare their performance. You can use caret package, as well as functions
such as lm.ridge, knn or any build in functions to perform the model fitting. You should consider using
plots to demonstrate the results.
Note: You should be careful about three things: 1) how scaling in ridge affect the degrees of freedom; 2) how
does kNN take care of categorical variables. 3) Intercept in the ridge regression is not penalized, so there is 1
df for that.
Explain or justify your approach.
data(Boston, package=”MASS”)
# head(Boston)
useLog = c(1,3,5,6,8,9,10,14)
Boston[,useLog] = log(Boston[,useLog])
Boston[,2] = Boston[,2] / 10
Boston[,7] = Boston[,7]^2.5 / 10^4
Boston[,11] = exp(0.4 * Boston[,11])/1000
Boston[,12] = Boston[,12] / 100
Boston[,13] = sqrt(Boston[,13])
Question 2 (Lasso regression) [5 points]
Use the Boston housing data again to perform the Lasso regression. For this question, you should consider
using the glmnet and the corresponding cross-validation version cv.glment to tune the parameters. Perform
a complete Lasso regression analysis of this data, such as plotting the cross-validation errors, and how the
estimated parameters change as a function of λ.
Select the best tuning that minimizes the cross-validation
error and report the selected variables. Compare this result to the best subset selection with AIC penalty.
Based on what we have learned, comment on how these two methods trade the bias-variance differently.
Extra-Credit Question [4 points]
On pages 8-9 of lecture 5.1, we discussed the coin tossing example that violates the strong likelihood principle.
The scientist tosses the coin 12 times, of which only 3 are heads. The statistician says: “you tossed the coin
12 times and you got 3 heads. The one-sided p-value is 0.07”. Then the scientist says: “Well, it wasn’t exactly
like that. . . I actually repeated the coin tossing experiment until I got 3 heads and then I stopped”. The
statistician say: “In that case, your p-value is 0.03”. Given explanations for the two different conclusions.