# APM 598: Homework 1 solution

\$30.00

Category:

## Description

Ex 1. [Overfitting]
Some data {xi
, yi}i=1…N has been generated from an unknown polynomial P∗(x) (see
figure 1 and code page 3 to load the data ’data_HW1_ex1.csv’ in Python), i.e.
yi = P∗(xi) + εi
,
where εi
is some ’white noise’ or more precisely ε ∼ N (0, σ2
) with σ
2
is the intensity of
the noise (also unknown). The goal is to estimate the polynomial P∗ and in particular
its degree denoted k∗ (i.e. k∗ = deg(P∗)).
Figure 1: Data points {xi
, yi}i=1..N generated from a polynomial P∗.
a) For each k = 0 . . . 12, estimate the polynomial Pˆ
k of order k that minimizes the
mean-square-error (use polyfit):
`(P) = 1
N
X
N
i=1
|yi − P(xi)|
2
,
in other words find the polynomial Pˆ
k satisfying:

k = argmin
P, deg(P)≤k
`(P).
Plot the loss `(Pˆ
k) as a function of k (i.e. plot the points {k, `(Pˆ
k)}k=0…12).
1
b) Split the data-set {xi
, yi}i=1…N into training and test sets using (resp.) 80% and
20% of the data. We define two loss functions:
`train(P) = 1
Ntrain
X
i in train set
|yi − P(xi)|
2
(1)
`test(P) = 1
Ntest
X
i in test set
|yi − P(xi)|
2
(2)
where Ntrain and Ntext denote the number of elements in (resp.) the train and test
sets (Ntrain + Ntext = N).
Similarly, for each k = 0 . . . 12, estimate the polynomial ePk of order k that minimizes
the mean-square-error over the training set:
ePk = argmin
P, deg(P)≤k
`train(P).
and plot the values of the loss functions `train and `test at ePk for all k.
c) Guess what is the order k∗ of the polynomial P∗ and give your estimate of its
coefficients.
The goal is to implement and test gradient descent methods. We use linear regression
as a toy problem using the data set {xi
, yi}i=1..N from Ex 1 and we would like to minimize
the following loss function:
`(a, b) = 1
N
X
N
i=1
|yi − (a + bxi)|
2
.
a) Compute the gradient of the loss function ∇` = (∂a`, ∂b`).
Deduce (numerically) the minimum (a∗, b∗) of `.
b) Starting from (a0, b0) = (1.8, 1.4) and using the constant learning rate η = .05,

an+1
bn+1 !
=

an
bn
!
− η∇`(an, bn).
Estimate the rate of convergence (i.e. how fast does k(an, bn) − (a∗, b∗)k converge
to zero?).
c) Implement the momentum and Nesterov algorithm, denoting θn = (an, bn),
momentum (
vn+1 = γvn + η∇`(θn)
θn+1 = θn − vn+1
Nesterov (
vn+1 = γvn + η∇`(θn − γvn)
θn+1 = θn − vn+1
Estimate the convergence rate for γ = .9 and η = .05.
defined in either Pytorch or TensorFlow.
2
Ex 3. [Classification]
The goal is to implement a linear classifier for the data-set MNIST-fashion (see also
code page 3).
a) Adapt the linear classifier for the MNIST-data set to the new data-base.
b) Train the model using 40 epochs with learning rate η = 10−4 and the gradient
Plot the evolution of the loss at each epoch.
c) Draw the ’template’ of each class.
##############################################
## Exercise 1 (Python) ##
##############################################
import matplotlib.pyplot as plt
import numpy as np
# get the data
x,y = data[:,0],data[:,1]
# plot
plt.figure(1)
plt.plot(x,y,’o’)
plt.xlabel(r’\$x\$’)
plt.ylabel(r’\$y\$’)
plt.show()
##############################################
## Exercise 3 (Python) ##
##############################################
from torchvision import datasets
import matplotlib.pyplot as plt