Description
Problem 1 (written) – 25 points
Imagine we have a sequence of N observations (x1, . . . , xN ), where each xi ∈ {0, 1}. We model this
sequence as i.i.d. random variables from a Bernoulli distribution with unknown parameter π ∈ [0, 1] and
known parameter, where
p(xi
|π) = π
xi
(1 − π)
1−xi
(a) What is the joint likelihood of the data (x1, . . . , xN )?
(b) Derive the maximum likelihood estimate πˆML for π.
To help learn π, you use a prior distribution. You select the distribution p(π) = beta(a, b).
(c) Derive the maximum a posteriori (MAP) estimate πˆMAP for π?
(d) Use Bayes rule to derive the posterior distribution of π and identify the name of this distribution.
(e) What is the mean and variance of π under this posterior? Discuss how it relates to πˆML and πˆMAP.
Problem 2 (coding) – 35 points
In this problem you will analyze data using the linear regression techniques we have discussed. The goal
of the problem is to predict the miles per gallon a car will get using six quantities (features) about that
car. The zip file containing the data can be found on Courseworks.1 The data is broken into training
and testing sets. Each row in both “X” files contain six features for a single car (plus a 1 in the 7th
dimension) and the same row in the corresponding “y” file contains the miles per gallon for that car.
Remember to submit all original source code with your homework. Put everything you are asked to show
below in the PDF file.
Part 1. Using the training data only, write code to solve the ridge regression problem
L = λkwk
2 +
P350
i=1 kyi − x
T
i wk
2
.
(a) For λ = 0, 1, 2, 3, . . . , 5000, solve for wRR. (Notice that when λ = 0, wRR = wLS.) In one figure,
plot the 7 values in wRR as a function of df(λ). You will need to call a built in SVD function to do
this (all details are in the slides). Be sure to label your 7 curves by their dimension in x.
(b) The 4th dimension (car weight) and 6th dimension (car year) clearly stand out over the other
dimensions. What information can we get from this?
(c) For λ = 0, . . . , 50, predict all 42 test cases. Plot the root mean squared error (RMSE)2 on the test
set as a function of λ—not as a function of df(λ). What does this figure tell you when choosing λ
for this problem (and when choosing between ridge regression and least squares)?
Part 2. Modify your code to learn a pth-order polynomial regression model for p = 1, 2, 3. (You’ve
already done p = 1 above.) For this implementation, do not include the cross terms for this problem, but
instead use the method discussed in the slides.
(d) In one figure, plot the test RMSE as a function of λ = 0, . . . , 500 for p = 1, 2, 3. Based on this
plot, which value of p should you choose and why? How does your assessment of the ideal value
of λ change for this problem?
1See https://archive.ics.uci.edu/ml/datasets/Auto+MPG for more details on this
dataset. Since I have done some preprocessing, you must use the data provided with this homework.
2RMSE =
q
1
42
P42
i=1(y
test
i − y
pred
i
)
2.

