## Description

## Problem 1: Linear Regression (100 pts)

Suppose that we are given data points x

(1), . . . , x(M) ∈ R

d with corresponding observed outcomes

y

(1), . . . , y(M) ∈ R. In least squares regression, the aim is to find a function of the form f : R

d → R

with f(x) = w

T x + b for some w ∈ R

d

, b ∈ R such that difference between f(x

(m)

) and y

(m)

is as

good as possible on average.

Formally, we solve the following convex optimization problem.

min

w∈Rd,b∈R

1

M

X

M

m=1

y

(m) − w

T x

(m) − b

2

Consider a penalized regression problem where we choose some λ > 0 and some regularizer h and

solve the following optimization problem.

min

w∈Rd,b∈R

”

1

M

X

M

m=1

y

(m) − w

T x

(m) − b

2

+ λh(w)

#

In this problem, we will consider the case h(w) = Pd

i=1 |wi

|.

1. In Python, implement subgradient descent for the above optimization problem. That is, write

a function that takes as input a matrix X ∈ R

m×n whose columns are the data points a vector

y of observed values, a λ > 0, an initial guess for w and b, and a number of iterations its

and returns the result of performing subgradient descent for its iterations starting from the

specified initial point.

2. In Python, implement proximal gradient descent (with h chosen as above) for the above

optimization problem. That is, write a function that takes as input a matrix X ∈ R

m×n

whose columns are the data points a vector y of observed values, a λ > 0, an initial guess

for w and b, and a number of iterations its and returns the result of performing proximal

gradient descent for its iterations starting from the specified initial point.

3. Use the data set attached to this homework and different choices of λ to explain which method

you think performs the best. Note, in the attached data, each row is of the form (x

(m)

T

, y(m)

).

4. What is the proximal update if h(w) = 1

2w

T Aw + b

T w for some positive semidefinite matrix

A?

1