Solved Homework 3 – Deep Neural Networks CSDS/541

$30.00

Original Work ?

Download Details:

  • Name: HomeWork3-uvrt8g.zip
  • Type: zip
  • Size: 1.62 MB

Category: Tags: , , You will Instantly receive a download link upon Payment||Click Original Work Button for Custom work

Description

5/5 - (1 vote)

1. Feed-forward neural network [60 points]: In this problem you will train a multi-layer neural network
to classify images of fashion items (10 different classes) from the Fashion MNIST dataset. Similarly to
Homework 3, the input to the network will be a 28 × 28-pixel image; the output will be a real number.
Specifically, the network you create should implement a function f : R
784 → R
10, where:
z
(1) = W(1)x + b
(1)
h
(1) = relu(z
(1))
z
(2) = W(2)h
(1) + b
(2)
.
.
.
.
.
.
z
(l) = W(l)h
(l−1) + b
(l)
ˆy = softmax(z
(l)
)
The network specified above is shown in the figure below: … … …
x z
(1) z
(2) …
h
(1) …
h
(2)
W(1) W(2)
b
(2) …
^
z
(l) …
… y
b
(l) b
(1)
W(l)

As usual, the (unregularized) cross-entropy cost function should be
fCE(W(1)
, b
(1)
, . . . ,W(l)
, b
(l)
) = −
1
n
Xn
i=1
X
10
k=1
y
(i)
k
log ˆy
(i)
k
where n is the number of examples.
Hyperparameter tuning: In this problem, there are several different hyperparameters and architectural design decisions that will impact the network’s performance:
• Number of hidden layers (suggestions: {3, 4, 5})
• Number of units in each hidden layer (suggestions: {30, 40, 50})
• Learning rate (suggestions: {0.001, 0.005, 0.01, 0.05, 0.1, 0.5})
• Minibatch size (suggestions: 16, 32, 64, 128, 256)
• Number of epochs
• L2 Regularization strength applied to the weight matrices (but not bias terms)
• Frequency & rate of learning rate decay
• Variance & type of random noise added to training examples for data augmentation
• . . .
1
These can all have a big impact on the test accuracy. In contrast to previous assignments, there is no
specific requirement for how to optimize them. However, in practice it will be necessary to do so in
order to get good results.
Numerical gradient check: To make sure that you are implementing your gradient expressions
correctly, you should use the check grad (and possibly its sister function, approx fprime). These
methods take a function f (i.e., a Python method that computes a function; in practice, this will be
the regularized cross-entropy function you code) as well as a set of points on which to compute f’s
derivative (some particular values for the weights and biases). The approx fprime will return the
numerical estimate of the gradient of f, evaluated at the points you provided. The check grad also
takes another parameter, ∇f, which is what you claim is the Python function that returns the gradient
of the function f you passed in. check grad computes the discrepancy (averaged over a set of points
that you specified) between the numerical and analytical derivatives. Both of these methods require
that all the parameters of the function (in practice: the weights and biases of your neural network)
are “packed” into a single vector (even though the parameters actually constitute both matrices and
vectors). For this reason, the starter code we provide includes a method called unpack that take a
vector of numbers and extracts W(1)
, W(2), . . . , as well as b
(1)
, b
(2), . . . . Note that the training data
and training labels are not parameters of the function f whose gradient you are computing/estimating,
even though they are obviously needed by the cross-entropy function to do its job. For this reason, we
“wrap” the call to fCE with a Python lambda expression in the starter code.
Your tasks:
(a) Implement stochastic gradient descent (SGD; see Section 5.9 and Algorithm 6.4 in the Deep
Learning textbook, https://www.deeplearningbook.org/) for the multi-layer neural network
shown above. Important: your backprop algorithm must work for any number of hidden layers.
(b) Verify that your gradient function is correct using the check grad method. In particular, include
in your PDF the real-valued output of the call to check grad for a neural network with 3 hidden
layers, each with 64 neurons (the discrepancy between the numerical and analytical derivative for
this case should be less than 1e-4).
(c) Include a screenshot in your submitted PDF file showing multiple iterations of SGD (just to show
that you actually ran your code successfully). For each iteration, report both the test accuracy
and test unregularized cross-entropy. For full credit, the accuracy (percentage correctly classified
test images) should be at least 88%.
(d) Visualize the first layer of weights W(1) that are learned after training your best neural network.
In particular, reshape each row of the weights matrix into a 28 × 28 matrix, and then create a
“grid” of such images. Include this figure in your PDF. Here is an example (for W(0) ∈ R
64×784).
Recommended strategy: First, (1) implement the forward- and back-propagation phases so
that you can perfectly (as verified with check grad) compute the gradient for just a single training
example. You will likely (unless you do everything perfectly from the get-go) need to “break” the
2
gradient vector, as returned by your gradCE and the approx fprime methods, into its individual
components (for the individual weight matrices and bias terms) so that you can compare each
of them one-by-one and see where any problems lie. Next, (2) you can implement minibatches
of size ˜n > 1 by simply iterating over the minibatch in a for-loop. Finally – and only after you
have correctly implemented step (2) – replace the for-loop (which is relatively slow) with matrix
operations that compute the same result.
In addition to your Python code (homework3 WPIUSERNAME1.py
or homework3 WPIUSERNAME1 WPIUSERNAME2.py for teams), create a PDF file (homework3 WPIUSERNAME1.pdf
or homework3 WPIUSERNAME1 WPIUSERNAME2.pdf for teams) containing the screenshots described above.