## Description

1 Two-layers neural networks

Ex 1.

Suppose x ∈ R

2

. We consider two-layers neural networks (n.n.) of the form (see fig.

1):

f(x) = b2 + W2

σ(b1 + W1 · x)

, (1)

where b1, b2 ∈ R

2 are ’bias’ vectors, W1, W2 ∈ M2×2(R) are matrices (2 × 2) and the

activation function σ is the ReLu function (i.e. σ(x) = max(x, 0)). We denote by s = f(x)

the score predicted by the model with s = (s1, s2) where s1 is the score for class 1 and s2

the score for class 2.

ReLu

Figure 1: Illustration of a two-layer neural network using ReLu activation function.

a) Consider the points given in figure 2-left where each color correspond to a different

class:

class 1: x1 = (1, 0) and x2 = (−1, 0),

class 2: x3 = (0, 1) and x4 = (0, −1).

Find some parameters b1, b2, W1 and W2 such that the scores s satisfy:

s1 > s2 for x1 and x2 , s1 < s2 for x3 and x4.

1

−1.0 −0.5 0.0 0.5 1.0

−1.0

−0.5

0.0

0.5

1.0

−1.5 −1.0 −0.5 0.0 0.5 1.0 1.5

−2.0

−1.5

−1.0

−0.5

0.0

0.5

1.0

1.5

Figure 2: Data points to classify.

b) Consider now the data-set given in figure 2-right (see code below to load the data).

Train a two-layer neural network of the form (1) to classify the points. Provide the

accuracy of the model (percentage of correctly predicted labels).

##################################

## Exercise 1 ##

##################################

import numpy as np

import pandas as pd

df = pd.read_csv(‘data_HW2_ex1.csv’)

X = np.column_stack((df[‘x1’].values,df[‘x2’].values))

y = df[‘class’].values

Ex 2.

The goal of this exercise is to show that two-layers neural networks with ReLu activation can approximate any continuous functions. To simplify, we restrict our attention

to the one-dimensional case:

g : [0, 1] −→ R (continuous).

We claim that for any ε > 0, there exists f two-layers n.n. such that:

max

x∈[0,1]

|g(x) − f(x)| < ε. (2)

In contrast to Ex 1, f will be taken with a large hidden layers, i.e. z ∈ R

m with m 1

(see figure 3-left). To prove this result, we are going to show that f can be used to

perform piece-wise linear interpolation (see figure 3-right).

a) Denote y0 = g(0) and y1 = g(1). Find a two layers n.n. such that f(0) = y0 and

f(1) = y1.

2

b) Consider now three points: y0 = g(0), y1 = g(1/2), y2 = g(1). Find f such that:

f(0) = y0, f(1/2) = y1 and f(1) = y2.

c) Generalize: write a program that take as inputs {(xi

, yi)}0≤i≤N with xi < xi+1 and

return a two layers n.n. such that f(xi) = yi

for all i = 0 . . . N.

Extra) Prove (2).

Hint: use that g is uniformly continuous on [0, 1].

ReLu

1

Figure 3: Left: two layers neural network used to approximate continuous function. The

hidden layer (i.e. z = (z1, . . . , zm)) is in general quite large. Right: to approximate

the continuous function g, we interpolate some of its values (xi

, yi) by a piece-wise linear

function.

2 Convolution

Ex 3.

Using convolutional layers, max pooling and ReLu activation functions, build a classifier for the Fashion-MNIST database (see a sketch example in figure 4). The accuracy of

your network on the test set will be your score on this exercise (+5 points for the group

with the highest accuracy).

input (image) output (score)

pooling flatten

+

fully

connected

conv

ReLu

+

conv

ReLu

+

channels

Figure 4: Schematic representation of neural network for image classification.

3