Description
Question 1 [40 + 5 (Optional) Points] (Classifying MNIST via Basic Shallow RNN) It is not only the
XOR computation that could be much easier via RNN. In fact, using RNNs we can perform conventional
classification also efficiently with simple NNs. In this question, we aim to classify MNIST via a shallow
vanilla RNN. Let’s first load MNIST dataset and break it into training and testing datasets.
1. Write a code that loads the MNIST training and test sets and split them into mini-batches of size
batch_size . You may complete the reference code.
To this end, we need first to interpret MNIST dataset as a set of sequence data. Read the following text
to find out how we do it.
MNIST Images as Sequences We can see an image as a sequence, where each sub-part of the image
describes an entry of the sequence. For MNIST, we represent each image as a sequence of pixel vectors
with each vector being 28-dimensional, i.e., x[t] ∈ R28. We label each sequence with a single label which
is the label of the image. With this interpretation, an MNIST image is given by a sequence x[1], . . . , x[T]
and has only one label at the last time instant, i.e., y[T], that represent its class.
Considering the above interpretation, answer the following items:
2. What is the length of each sequence in the MNIST dataset?
3. What type of sequence to sequence problem, e.g., many-to-many, one-to-many, etc, is this problem?
We aim to use a shallow RNN, i.e., one input layer, one recurrent unit, and one output layer. The time
diagram of this RNN is shown in Figure 2.1. We assume that the size of the hidden state is 150, i.e.,
h[t] ∈ R150
.
Given this architecture, answer the following items:
4. Specify the input and output dimension of the hidden and output layers of this RNN.
5. Explain how each layer is activated in this RNN.
6. Assume that the RNN is trained. Explain how we can use the trained RNN to classify a new image.
We now implement the RNN, but not from scratch. We use some built-in modules. Read the following
text to learn how you could do it.
Figure 2.1: Architecture of out target RNN used for MNIST classification.
Implementing Basic RNN A basic Elman-like recurrent unit in the nn module of PyTorch by calling
nn.RNN() . This class gets the dimension of input and hidden state as well as other hyperparameters to
make an Elman-like unit for RNN. Note that this is only the recurrent unit, and we yet need to implement
output layer. We could make it deep by setting the number of hidden layers to be more than one.
However, we will not do it in this assignment. Important thing that we should consider is to set
batch_first = True . This is to make sure, that our recurrent unit will consider the first dimension of
the input to be the batch-size. This option is by default False , so we should make sure to set it on. For
instance, if we want to make a single-layer recurrent unit with 2-dimensional input and 3-dimensional
hidden states, we write:
1 RNN_layer = nn.RNN(2, 3, batch_first = True)
Now, if we pass a sequence x forward in this unit, we will get two outputs:
1. The sequence of hidden states which has the same time length as x.
2. The last hidden state, i.e., h[T] whose dimension is the same as one time entry of x.
We now start with implementing our RNN via this basic recurrent unit.
Answer the following items:
7. Write the class myRNN that realizes the components of the RNN. You can do it by completing the
reference code.
8. Add function to this class that generates an initial zero hidden state. The hidden state should be a
tensor of size (1, batch_size, size_of_hidden_state) .
9. Add a forward pass to the class.
We next need a function that takes the output of the RNN to an input mini-batch and computes the
accuracy of classification by comparing the output with the true label. The implementation is slightly
different to FNNs, since in this case we classify the image based on the last entry in the output sequence.
10. Write a function that gets the output of the last time step for a complete mini-batch along with the
list of true labels and returns the average error over the mini-batch. You can do it by completing
the reference code.
11. Confirm that your implementation returns correct dimensions.
We now complete the training loop for this RNN and train the model.
12. Write the function train() that gets an instant model, a loss function and a number of epochs
and trains the model using the given loss function for the specified number of epochs. You can do
this by completing the reference code.
13. Instantiate model = myRNN() and pass it to the function train() to train it for 10 epochs with
cross-entropy loss function.
14. Do you think that the same shallow FNN without any recurrence could return such result? How
do you think the RNN understands the class of an image?
In case interested, you could further complete the following optional task.
15⋆
. (Optional) Implement the recurrent unit, i.e., nn.RNN() , from scratch. Replace it in your implementation and compare the result with what you observed via nn.RNN() .
Question 2 [15 Points] (Classifying MNIST via Shallow GRU) We now modify our implementation
by replacing the basic RRN with a gated recurrent unit (GRU). We can access a GRU directly in the
torch.nn module as
1 nn.GRU(… , … , batch_first = True)
The input and output of this unit are read similar to the basic recurrent unit.
1. Write the class myGatedRNN that realizes the RNN in Figure 2.1 with its recurrent unit being a
single layer (shallow) GRU. You can do it by completing the reference code.
2. Add function to this class that generates an initial zero hidden state.
3. Add a forward pass to the class.
We now train this gated RNN. For this, we could use our train() function from Question 1.
4. Instantiate model = myGatedRNN() and pass it to the function train() to train it for 10 epochs
with cross-entropy loss function.
5. Compare your result with the previous implementation via basic recurrent unit. Explain your
observation.
Question 3 [15 Points] (Classifying MNIST via Shallow LSTM) As the last exercise, we want to modify
our implementation by replacing the basic recurrent unit with an LSTM. We can access an LSTM directly
in the torch.nn module by
1 nn.LSTM(… , … , batch_first = True)
The input and output of this unit are read similar to the basic recurrent unit and GRU. However, we need
to further input the cell state. As mentioned in the course, this is the state that remains in the LSTM and
does not go to the upper layers.
1. Write the class myLSTM that realizes the RNN in Figure 2.1 with its recurrent unit being a single
layer (shallow) LSTM. You can do it by completing the reference code.
2. Add function to this class that generates an initial zero hidden state and cell state.
3. Add a forward pass to the class.
We now train this gated RNN. For this, we could use our train() function from Question 1.
4. Instantiate model = myLSTM() and pass it to the function train() to train it for 10 epochs with
cross-entropy loss function.
5. Compare your result with the previous two implementations. Explain your observation

