## Description

1. (10 points) Consider the convolutional neural network architecture given in Figure 1 for

classifying MNIST digits. Assume that the input images are reduced to size 10 × 10 with

only 1 channel (represented as a matrix in R

10×10). In this architecture, the convolutional

layer uses a 3 × 3 filter, Wconv, with stride 2 and zero padding stride 3 and zero padding of

size 1. The dimensions of the outputs of each layer are shown below.

Figure 1: A toy CNN

(a) (5 points) What are the values of n, m, k in the graph?

(b) (5 points) What are the sizes of Wconv, Wfc, and b?

Programming Problems: The next two problems are programming problems and will focus on

implementing neural networks for handwritten digits classification. We will use the MNIST dataset

where each sample is an image of a hand-written digit and has a corresponding label indicating the

value of the digit written (0, 1, . . . , 9). This makes it a multi-class classification problem.

You must use Tensorflow 2 to implement your neural networks. The implementation must be

for the CPU version only (no GPUs or MPI parallel programming is required for this assignment).

Follow the installation instructions at https://www.tensorflow.org/install in case you want to

use your local machine (we recommend using anaconda); you may also use colab to this assignment.

You can load the MNIST dataset directly in Tensorflow with the following code:

import tensorflow as tf

mnist = tf.keras.datasets.mnist

(x train, y train), (x test, y test) = mnist.load data()

2. (40 points) Implement a multi-layer fully connected neural network:

• Input: 1-channel input, size 28×28

• Fully connected layer 1: input with bias; output – 128 nodes

• ReLU activation function

• Fully connected layer 2: input – 128 nodes; output – 10 nodes

• Softmax activation function

• Use cross entropy as the loss function

• Use SGD as optimizer

• Set mini batch size as 32

Train using mini batches of the given batch size. Plot the cumulative training loss and

accuracy for every epoch. Once training is complete, apply the learned model to the test set

and report the testing accuracy.

Epoch: An epoch is a single pass through all the training data. Typically many epochs will

be run when training a neural network before it converges.

Please submit (a) summary of methods and results report and (b) code:

(a) Summary of methods and results: Briefly describe the approaches used above, along

with relevant equations. Report the cumulative training loss and accuracy for every epoch

as plots. Also report the testing accuracy (a single number).

(b) Code: Submit the file neural net.py which contains the function def neural net()

-> None:. The function does not have any inputs and does not return anything but must

print out to the terminal (stdout) the cumulative training loss and accuracy per epoch

as well as the testing accuracy.

3. (50 points) Implement a convolutional neural network with the following specifications.

• Input: 1-channel input, size 28×28

• Convolution layer: Convolution kernel size is (3, 3) with stride as 1. Input channels – 1;

Output channels – 20 nodes

• ReLU activation function

• Max-pool: 2×2 max pool

• Dropout layer with probability p = 0.50

• Flatten input for feed to fully connected layers

• Fully connected layer 1: flattened input with bias; output – 128 nodes

• ReLU activation function

• Dropout layer with probability p = 0.50

• Fully connected layer 2: input – 128 nodes; output – 10 nodes

• Softmax activation function

• Use cross entropy as loss function

For this problem, we will be experimenting with a variety of parameters.

First, train using SGD as the optimizer and mini batches of size 32. Plot the cumulative

training loss and accuracy for every epoch. Once training is complete, apply the learned

model to the test set and report the testing accuracy.

Second, train your network using mini batch sizes of [32, 64, 96, 128] and plot the convergence

run time vs mini batch sizes for each of the following optimizers: SGD, Adagrad, and Adam.

You should report 3 figures, one for each optimizer where each figure has mini batch size on

the x-axis and the convergence run time on the y-axis.

Please submit (a) summary of methods and results report and (b) code:

(a) Summary of methods and results: Briefly describe the approaches used above, along

with relevant equations. Report the cumulative training loss and accuracy for every epoch

as plots. Also report the testing accuracy (a single number). Also report the convergence

run time vs mini batch sizes for each mini batch size and optimizer above (3 plots).

(b) Code: Submit the file cnn.py which contains the function def cnn() -> None:. The

function does not have any inputs and does not return anything but must print out to

the terminal (stdout) the cumulative training loss and accuracy as well as the testing

accuracy (for the first part above). It must also print out the batch size and convergence

run time for each mini batch size and optimizer (for the second part above).

Additional instructions: Code can only be written in Python 3.6+; no other programming

languages will be accepted. One should be able to execute all programs from the Python command

prompt or terminal. Please specify instructions on how to run your program in the README file.

Each function must take the inputs in the order specified in the problem and display the textual

output via the terminal and plots/figures should be included in the report.

For each part, you can submit additional files/functions (as needed) which will be used by the

main file. In your code, you cannot use machine learning libraries such as those available from

scikit-learn for learning the models – exception being that you must use Tensoflow 2 for your neural

network implementations. You may also use libraries for basic matrix computations and plotting

such as numpy, pandas, and matplotlib. Put comments in your code so that one can follow the key

parts and steps in your code.

Your code must be runnable on a CSE lab machine (e.g., csel-kh1260-01.cselabs.umn.edu).

One option is to SSH into a machine. Learn about SSH at these links: https://cseit.umn.edu/

knowledge-help/learn-about-ssh, https://cseit.umn.edu/knowledge-help/choose-ssh-tool,

and https://cseit.umn.edu/knowledge-help/remote-linux-applications-over-ssh.

Instructions

Follow the rules strictly. If we cannot run your code, you will not get any credit.

• Things to submit

1. hw3.pdf: The report that contains the solutions to Problems 1, 2, and 3 including the

summary of methods and results.

2. neural net.py: Code for Problem 2.

3. cnn.py: Code for Problem 3.

4. README.txt: README file that contains your name, student ID, email, instructions

on how to run your code, any assumptions you are making, and any other necessary

details.

5. Any other files, except the data, which are necessary for your code.

Homework Policy. (1) You are encouraged to collaborate with your classmates on homework

problems, but each person must write up the final solutions individually. You need to list in the

README.txt which problems were a collaborative effort and with whom. (2) Regarding online

resources, you should not:

• Google around for solutions to homework problems,

• Ask for help on online,

• Look up things/post on sites like Quora, StackExchange, etc.