## Description

Ex 1. [Overfitting]

Some data {xi

, yi}i=1…N has been generated from an unknown polynomial P∗(x) (see

figure 1 and code page 3 to load the data ’data_HW1_ex1.csv’ in Python), i.e.

yi = P∗(xi) + εi

,

where εi

is some ’white noise’ or more precisely ε ∼ N (0, σ2

) with σ

2

is the intensity of

the noise (also unknown). The goal is to estimate the polynomial P∗ and in particular

its degree denoted k∗ (i.e. k∗ = deg(P∗)).

Figure 1: Data points {xi

, yi}i=1..N generated from a polynomial P∗.

a) For each k = 0 . . . 12, estimate the polynomial Pˆ

k of order k that minimizes the

mean-square-error (use polyfit):

`(P) = 1

N

X

N

i=1

|yi − P(xi)|

2

,

in other words find the polynomial Pˆ

k satisfying:

Pˆ

k = argmin

P, deg(P)≤k

`(P).

Plot the loss `(Pˆ

k) as a function of k (i.e. plot the points {k, `(Pˆ

k)}k=0…12).

1

b) Split the data-set {xi

, yi}i=1…N into training and test sets using (resp.) 80% and

20% of the data. We define two loss functions:

`train(P) = 1

Ntrain

X

i in train set

|yi − P(xi)|

2

(1)

`test(P) = 1

Ntest

X

i in test set

|yi − P(xi)|

2

(2)

where Ntrain and Ntext denote the number of elements in (resp.) the train and test

sets (Ntrain + Ntext = N).

Similarly, for each k = 0 . . . 12, estimate the polynomial ePk of order k that minimizes

the mean-square-error over the training set:

ePk = argmin

P, deg(P)≤k

`train(P).

and plot the values of the loss functions `train and `test at ePk for all k.

c) Guess what is the order k∗ of the polynomial P∗ and give your estimate of its

coefficients.

Ex 2. [Gradient descent]

The goal is to implement and test gradient descent methods. We use linear regression

as a toy problem using the data set {xi

, yi}i=1..N from Ex 1 and we would like to minimize

the following loss function:

`(a, b) = 1

N

X

N

i=1

|yi − (a + bxi)|

2

.

a) Compute the gradient of the loss function ∇` = (∂a`, ∂b`).

Deduce (numerically) the minimum (a∗, b∗) of `.

b) Starting from (a0, b0) = (1.8, 1.4) and using the constant learning rate η = .05,

implement the gradient descent algorithm:

an+1

bn+1 !

=

an

bn

!

− η∇`(an, bn).

Estimate the rate of convergence (i.e. how fast does k(an, bn) − (a∗, b∗)k converge

to zero?).

c) Implement the momentum and Nesterov algorithm, denoting θn = (an, bn),

momentum (

vn+1 = γvn + η∇`(θn)

θn+1 = θn − vn+1

Nesterov (

vn+1 = γvn + η∇`(θn − γvn)

θn+1 = θn − vn+1

Estimate the convergence rate for γ = .9 and η = .05.

extra) Test other gradient descent methods (e.g. Adam, AdaGrad, RMSProp) using class

defined in either Pytorch or TensorFlow.

2

Ex 3. [Classification]

The goal is to implement a linear classifier for the data-set MNIST-fashion (see also

code page 3).

a) Adapt the linear classifier for the MNIST-data set to the new data-base.

b) Train the model using 40 epochs with learning rate η = 10−4 and the gradient

descent method of your choices.

Plot the evolution of the loss at each epoch.

c) Draw the ’template’ of each class.

##############################################

## Exercise 1 (Python) ##

##############################################

import matplotlib.pyplot as plt

import numpy as np

# get the data

data = np.loadtxt(‘data_HW1_ex1.csv’,delimiter=’,’)

x,y = data[:,0],data[:,1]

# plot

plt.figure(1)

plt.plot(x,y,’o’)

plt.xlabel(r’$x$’)

plt.ylabel(r’$y$’)

plt.show()

##############################################

## Exercise 3 (Python) ##

##############################################

from torchvision import datasets

import matplotlib.pyplot as plt

# Download and load

data_collection = datasets.FashionMNIST(‘data_fashionMNIST’, train=True, download=True)

# Illustration

label_fashion = dict([(0,’T-shirt’),(1,’trouser’),(2,’pullover’),(3,’dress’),(4,’coat’),

(5,’sandal’),(6,’shirt’),(7,’sneaker’),(8,’bag’),(9,’boot’)])

X,y = data_collection.__getitem__(123)

plt.figure(1);plt.clf()

plt.imshow(X)

plt.title(“Example of image with label “+label_fashion[y.item()])

plt.show()

3