APM 598: Homework 1 solution

$30.00

Original Work ?
Category: You will Instantly receive a download link for .ZIP solution file upon Payment

Description

5/5 - (4 votes)

Ex 1. [Overfitting]
Some data {xi
, yi}i=1…N has been generated from an unknown polynomial P∗(x) (see
figure 1 and code page 3 to load the data ’data_HW1_ex1.csv’ in Python), i.e.
yi = P∗(xi) + εi
,
where εi
is some ’white noise’ or more precisely ε ∼ N (0, σ2
) with σ
2
is the intensity of
the noise (also unknown). The goal is to estimate the polynomial P∗ and in particular
its degree denoted k∗ (i.e. k∗ = deg(P∗)).
Figure 1: Data points {xi
, yi}i=1..N generated from a polynomial P∗.
a) For each k = 0 . . . 12, estimate the polynomial Pˆ
k of order k that minimizes the
mean-square-error (use polyfit):
`(P) = 1
N
X
N
i=1
|yi − P(xi)|
2
,
in other words find the polynomial Pˆ
k satisfying:

k = argmin
P, deg(P)≤k
`(P).
Plot the loss `(Pˆ
k) as a function of k (i.e. plot the points {k, `(Pˆ
k)}k=0…12).
1
b) Split the data-set {xi
, yi}i=1…N into training and test sets using (resp.) 80% and
20% of the data. We define two loss functions:
`train(P) = 1
Ntrain
X
i in train set
|yi − P(xi)|
2
(1)
`test(P) = 1
Ntest
X
i in test set
|yi − P(xi)|
2
(2)
where Ntrain and Ntext denote the number of elements in (resp.) the train and test
sets (Ntrain + Ntext = N).
Similarly, for each k = 0 . . . 12, estimate the polynomial ePk of order k that minimizes
the mean-square-error over the training set:
ePk = argmin
P, deg(P)≤k
`train(P).
and plot the values of the loss functions `train and `test at ePk for all k.
c) Guess what is the order k∗ of the polynomial P∗ and give your estimate of its
coefficients.
Ex 2. [Gradient descent]
The goal is to implement and test gradient descent methods. We use linear regression
as a toy problem using the data set {xi
, yi}i=1..N from Ex 1 and we would like to minimize
the following loss function:
`(a, b) = 1
N
X
N
i=1
|yi − (a + bxi)|
2
.
a) Compute the gradient of the loss function ∇` = (∂a`, ∂b`).
Deduce (numerically) the minimum (a∗, b∗) of `.
b) Starting from (a0, b0) = (1.8, 1.4) and using the constant learning rate η = .05,
implement the gradient descent algorithm:

an+1
bn+1 !
=

an
bn
!
− η∇`(an, bn).
Estimate the rate of convergence (i.e. how fast does k(an, bn) − (a∗, b∗)k converge
to zero?).
c) Implement the momentum and Nesterov algorithm, denoting θn = (an, bn),
momentum (
vn+1 = γvn + η∇`(θn)
θn+1 = θn − vn+1
Nesterov (
vn+1 = γvn + η∇`(θn − γvn)
θn+1 = θn − vn+1
Estimate the convergence rate for γ = .9 and η = .05.
extra) Test other gradient descent methods (e.g. Adam, AdaGrad, RMSProp) using class
defined in either Pytorch or TensorFlow.
2
Ex 3. [Classification]
The goal is to implement a linear classifier for the data-set MNIST-fashion (see also
code page 3).
a) Adapt the linear classifier for the MNIST-data set to the new data-base.
b) Train the model using 40 epochs with learning rate η = 10−4 and the gradient
descent method of your choices.
Plot the evolution of the loss at each epoch.
c) Draw the ’template’ of each class.
##############################################
## Exercise 1 (Python) ##
##############################################
import matplotlib.pyplot as plt
import numpy as np
# get the data
data = np.loadtxt(‘data_HW1_ex1.csv’,delimiter=’,’)
x,y = data[:,0],data[:,1]
# plot
plt.figure(1)
plt.plot(x,y,’o’)
plt.xlabel(r’$x$’)
plt.ylabel(r’$y$’)
plt.show()
##############################################
## Exercise 3 (Python) ##
##############################################
from torchvision import datasets
import matplotlib.pyplot as plt
# Download and load
data_collection = datasets.FashionMNIST(‘data_fashionMNIST’, train=True, download=True)
# Illustration
label_fashion = dict([(0,’T-shirt’),(1,’trouser’),(2,’pullover’),(3,’dress’),(4,’coat’),
(5,’sandal’),(6,’shirt’),(7,’sneaker’),(8,’bag’),(9,’boot’)])
X,y = data_collection.__getitem__(123)
plt.figure(1);plt.clf()
plt.imshow(X)
plt.title(“Example of image with label “+label_fashion[y.item()])
plt.show()
3