COM S 573: Home work 4 solution

$24.99

Original Work ?
Category: You will Instantly receive a download link for .ZIP solution file upon Payment

Description

5/5 - (3 votes)

1. [8 points] A cubic regression spline with one knot at ξ can be obtained using a basis of the form 1,x,x2,x3,(x−ξ)3 + where (x−ξ)3 + = (x−ξ)3 if x ξ and equals 0 otherwise. Show that a function of the form f(x) = β0 + β1x + β2×2 + β3×3 + β4(x − ξ)3 + is indeed a cubic regression spline, regardless of the values of β0,β1,β2,β3,β4.
2. It was mentioned that GAMs are generally fit using a backfitting approach. The idea behind backfitting is actually quite simple. We will now explore backfitting in the context of multiple linear regression. Suppose that we would like to perform multiple linear regression, but we do not have software to do so. Instead, we only have software to perform simple linear regression. Therefore, we take the following iterative approach: we repeatedly hold all but one coefficient estimate fixed at its current value, and update only that coefficient estimate using a simple linear regression. The process is continued until convergence–that is, until the coefficient estimates stop changing. The process flow is sketched next.
1. Download the adv.dat data set (n = 200) with response Y and 2 predictors X1 and X2 on BlackBoard 2. Initialize ˆ β1 (estimated coefficient of X1) to take on a value of your choice, say 0. 3. Keeping ˆ β1 fixed, fit the model Y − ˆ β1 X1 = β0 +β2X2 + e
4. Keeping ˆ β2 fixed, fit the model
Y − ˆ β2X2 = β0 +β1X1 + e. (a) [6 points] Write a for loop to repeat (3) and (4) 1,000 times. Report the estimates of ˆ β0, ˆ β1 and ˆ β2 at each iteration of the for loop. Create a plot in which each of these values is displayed, with ˆ β0, ˆ β1 and ˆ β2 each shown in a different color. (b) [2 points] Compare your answer in (a) to the results of simply performing multiple linear regression to predict Y using X1 and X2. Use the abline() function to overlay those multiple linear regression coefficient estimates on the plot obtained in (a).
1
(c) [1 point] On this data set, how many backfitting iterations were required in order to obtain a “good” approximation to the multiple regression coefficient estimates? What would be a good stopping criterion?
3. [5 points] Show that the Nadaraya-Watson estimator is equal to local constant fitting. Hint: Use the local polynomial cost function to start and adapt where necessary.
4. [3 points] Show that the kernel density estimate ˆ
f(x) =
1 nh
n X i=1
Kx − Xi h ,
with kernel K and bandwidth h 0, is a bonafide density. Did you need any condition(s) on K? If so, which one(s).