LING572 Hw9: Neural Networks solution

$25.00

Original Work ?
Category: You will Instantly receive a download link for .ZIP solution file upon Payment

Description

5/5 - (1 vote)

Q1 (10 points): Let f
0
(x) denote the derivative of a function f(x) w.r.t. the variable x.
(a) 2 pts: What does f’(x) intend to measure?
(b) 2 pts: Let h(x) = f(g(x)). What is h
0
(x)?

(c) 2 pts: Let h(x) = f(x)g(x). What is h
0
(x)?
(d) 2 pts: Let f(x) = a
x
, where a > 0. What is f
0
(x)?
(e) 2 pts: Let f(x) = x
10 − 2x
8 +
4
x2 + 10. What is f
0
(x)?

Q2 (15 points): The logistic function is f(x) = 1
1+e−x . The tanh function is g(x) = e
x−e−x
e
x+e−x .
1
(a) 5 pts: Prove that f
0
(x) = f(x)(1 − f(x)).
(b) 5 pts: Prove that g
0
(x) = 1 − g
2
(x).
(c) 5 pts: Prove that g(x) = 2f(2x) − 1

Q3 (15 points): Let us denote the partial derivative of a multi-variate function f w.r.t. one of its
variables x by f
0
x or df
dx .
(a) 2 pts: What is f
0
x
trying to measure?

(b) 2 pts: Let f(x, y) = x
3 + 3x
2y + y
3 + 2x. What is f
0
x
? What is f
0
y
?

(c) 2 pts: Let z =
Pn
i=1 wixi
. What is dz
dwi
?
(d) 4 pts: Let f(z) = 1
1+e−z and z =
Pn
i=1 wixi
.
What is df
dz ?
What is df
dwi
?
Hint: Use the answers that contain f(z).

(e) 5 pts: Let E(z) = 1
2
(t−f(z))2
, f(z) = 1
1+e−z and z =
Pn
i=1 wixi
. What is dE
dwi
? Hint: the answer
should contain f(z).

Q4 (10 points): The softmax function:

(a) 5 pts: In general where in NNs is the softmax function used and why?
(b) 5 pts: If a vector x is [1, 2, 3, -1, -4, 0], what is the value of softmax(x)?

Q5 (15 points): Suppose a feedforward neural network has m layers: the input layer is the 1st layer,
the output layer is the last layer, and there are m−2 hidden layers in between. The number of neurons
in the i
th layer is ni
. Each neuron in one layer is connected to every neuron in the next layer and
there is no other connection.

(a) 5 pts: How many connections (i.e., weights) are there in this network?
(b) 10 pts: Let x be a column vector that denotes the values of the input layer. Let Mk denote the
weight matrix between layer k and k + 1; that is, the cell ai,j in Mk stores the weight on the arc
from the j
th neuron in layer k to the i
th neuron in layer k + 1. Let g be the activation function
used in each layer.

• Given the input x, what is the formula for calculating the output of the first hidden layer?
• Given the input x, what is the formula for calculating the output of the output layer?

• Hint: In class, we show the formula for calculating the z and y value for a neuron, where
z = b +
P
j wjxj and y = g(z). Now there are n2 neurons in the 2nd layer. The output of
this layer, y, is going to be a column vector, not a real number. The weights between the
two layers are no longer a vector, but a n2 × n1 matrix denoted by M1.

So the answer to
the 1st question should be a simple formula that uses matrix operations. For the sake of
simplicity, let’s assume the bias b is always zero.

• Terminology: A row vector is a 1×n matrix (e.g., [a1, a2, …, an]); a column vector is a n×1
matrix. If you transpose a row vector, you get a column vector.

Q6 (40 points): Read Chapter 1 of the NN book, and answer the following based on that chapter:
(a) 5 pts: What’s the loss function used in the digit recognition task? Why do they choose to minimize this function instead of maximizing classification accuracy?

(b) 10 pts: In gradient descent, what’s the formula for updating the weight matrix (or vector)? And
why is that a good formula?

(c) 15 pts: What are the main idea and benefit of stochastic gradient descent?
What is a training epoch?
Let T be the size of the training data, m be the size of mini-batch, and your training process
contains E training epoches. How many times is each weight in the NN updated?

(d) 10 pts: How can one choose the learning rate? What’s the risk if the rate is too big? What’s
the risk if the rate is too small?

Q7 (25 points): Go over the source code in Nielson’s package stored under /dropbox/18-19/572/hw9/nielsennn/ on patas and understand the part explained in Chapter 1 of the NN book.
• Run the code (following the instructions in chapter 1) and fill out Table 1. For this exercise, use
only one hidden layer.

• It seems that the code works with python 2.*, not with python 3.*. If you run the default python
version on patas, which is 2.7.5, the code should work.
• Note that as the package uses random functions a few times, your results will not be the same
when running it multiple times.

Table 1: Results on digit recognition
Expt id # of hidden neurons epoch # mini batch size learning rate accuracy
1 30 30 10 3.0
2 10 30 10 3.0
3 30 30 10 0.5
4 30 30 10 10
5 30 30 100 3.0

Submission: Submit the following to Canvas:
• Since hw9 has no coding part, you only need to submit your readme.pdf which includes answers
to all the questions, plus anything you want TA to know. No need to submit anything else.