## Description

## 1 Purpose

The main purpose for this lab is for you to get familiar with some of the key

ingredients of deep neural network (DNN) architectures. The focus in this

assignment is on restricted Boltzmann machines (RBMs) and autoencoders.

After completing this assignment, you should be able to

• explain key ideas underlying the learning process of RBMs and autoencoders,

• apply basic algorithms for greedy training of RBM and autoencoder layers

with the use of commonly available deep learning libraries,

• design multi-layer neural network architectures based on RBM and autoencoder layers for classication problems,

• study the functionality (including generative aspects) and test the performance of low-scale deep architectures developed using RBMs (deep belief

networks, DBNs) and autoencoders.

## 2 Scope and resources

In this lab assignment you are encouraged to rely on the existing deep learning

libraries (available in Matlab, Python, Java among others; examples are scikitlearn, deeplearning4j and Deep Neural Network Matlab toolbox by M. Tanaka).

Yet, I would recommend that you explore nuances of setting up RBMs and their

implementation. Hintons guide [1] could be helpful in this regard. Consequently,

while reusing library implementations, it is strongly advised that you should not

just copy ready examples from the libraries that address very similar problems

to those in the assignment (it is possible to nd such example scripts on the web

since the lab makes use of one of the classical benchmark tests).

Unlike typical

benchmark applications of DBNs, here the data have already been normalized

and converted to pseudobinary distributions, so that you could directly employ

a Bernoulli type of RBMs, not Gaussian. You can read more on the comparative

analysis of dierent versions of RBMs in [2].

Finally, please be aware that you

are given a fair deal of freedom in many choices that have to be made in this

lab. Please motivate your decisions even if you nd them somewhat arbitrary.

Also, since this lab is only reported in writing please invest extra eort into

your reports.

## 3 Tasks and questions

The lab consists of two main tasks, each one covering two approaches – based on

RBMs and autoencoders. The data for the assignment can be downloaded from

the course website. In particular, the MNIST dataset consists of four csv les

with data for training, bindigit_trn and targetdigit_trn, and the other two

for testing, bindigit_tst and targetdigit_tst.

Data in bindigit les represent 28-by-28 matrices organized into 784-dim binary vectors (strings of 0s and

1s). There are 8000 such vectors in bindigit_trn and 2000 inbindigit_tst.

Analogously, targetdigit_trn le contains a vector of 8000 integer values and

targetdigit_tst has 2000 integers between 0 and 9, which describe corresponding labels for the 28-by-28 images of handwritten digits (data adapted from

the MNIST database, normalized with grey levels converted to simpler binary

representations).

Data in both training and test sets are relatively balanced with

respect to 10 classes. You can verify this by examining histogram of the available

labels. Furthermore, in Matlab you can plot a digit image represented by the

k-th vector in bindata matrix using imshow(reshape(bindata(k,:),28,28)).

3.1 RBM and autoencoder features for binary-type MNIST

images

Your task here is to train a) an RBM and b) autoencoder (weights connecting

visible and hidden unit layers). More specically, please rst initialize the weight

matrix with small (normally distributed) random values with hidden and visible

biases initialized to 0.

Then, iterate the training process, contrastive divergence

for RBM and gradient descent based error minimization for the autoencoder, for

a number of epochs (i.e. full swipes through the training data) equal to 10, 20

until convergence (you can experiment a bit, also adjusting the learning rate).

• For each image compute the mean error between the original input and

the reconstructed input. Then use it to compute the total error on the

dataset for the current epoch. Once training is completed, plot the total

error as a function of the epochs. Finally, sample one image from each

digit class and obtain its reconstruction, then plot both the original and

reconstructed images. Try dierent number of hidden nodes, say 50, 75,

100, 150, and compare errors.

• Next, plot the 784 bottom-up nal weights for each hidden unit, using

one dierent gure for each hidden unit (reshape the weight vector as a

matrix and plot it as an image). Do this part for the congurations with

50 and 100 nodes.

Discuss your observations and illustrate your ndings with plots as well as both

quantitative and qualitative arguments. Choose most interesting comparisons

and eects to demonstrate.

3.2 DBN and stacked autoencoders for MNIST digit classicatione

Taking advantage of the developments in the previous task, here you are requested to extend a single-hidden layer network to a deeper architecture by

following the idea of greedy layer-wise pretraining (without labels as in the previous task for a single layer).

This time however you will add at the top of

the network s hidden layers the output layer, i.e. the layer with output nodes

corresponding to the classication output. Please train the weights of the connections from the top-most hidden layer to the output layer with a generalized

delta rule or conjugate gradient optimization.

Next, perform test and quantify

the classication performance on the test set. In particular, please address the

following tasks and questions.

3.2.1 Classication with deeper architectures

Compare the classication performance obtained with dierent number of hidden layers (1,2 and 3). Add to this analysis, please, a network conguration

with a simple classication layer operating directly on the raw inputs as the

no-hidden-layer option.

As the size of hidden layers, rst choose the optimal

number of nodes in the rst hidden layer based on your experiences in the previous task (3.1) and then decide on the size of the other layers within a similar

range with tendency to have less and less units.

Run these comparisons/analyses

independently for stacked autoencoders and DBNs ( stacked RBMs). Examine

the hidden layer representations (beyond the rst hidden layer already studied

in the previous task). Observe the eect of images representing dierent digits

on hidden units in the hidden layers.

Finally, compare the deep network con-

gurations of your choice, DBN and stacked autoencoders (two- or three-layer

networks with selected number of hidden units), pre-trained in a greedy layerwise manner and containing a classication layer trained in supervised mode

with an analogous MLP architecture trained from scratch using backprop.

An

intermediate option would be a comparison with deep networks pretrained as

before but with backprop-type ne tuning of the weights in hidden layers (not

only in a supervised output layer).

Additional remarks

• Instead of using an extra supervised network layer to connect hidden-layerrepresentations via gradient descent or conjugate gradient optimization

(some sort of gradient neural supervised learning), you can perform tests

using a commonly employed logistic regression.

• In addition, if the simulations take heavy computational toll on your

PC/laptops etc., please feel free to subsample your training set maintaining the class balance.

3.2.2 Generative mode of DBNs (optional part)

After training, the DBN can be used to generate sample digit images for a selected class (digit). To this end, the desirable output class conguration should

rst be clamped to particular values (0s and 1s for the output one-out-of-n coding). Next, the remaining visible units of the top-level RBM should be sampled

randomly according to their bias terms, which initializes the visible data vector

for the top-level RBM to a reasonable unbiased starting point.

Next, alternating

Gibbs sampling, which was used in the learning process as part of contrastive

divergence, should run for many steps. It is expected that after many steps the

network settles close to its equilibrium distribution given the clamped labels.

Then, a single top-down pass converts the binary feature activations into an image consistent with the sample from the top-level RBM. In this task, you could

qualitatively examine the DBN s perception of digit images. This capability

to sample data from a trained network constitutes a valuable and interesting

property of generative models like DBNs.

As before, please share your key observations and illustrate your ndings. Choose

most interesting comparisons and eects to demonstrate, be selective (with different hyperparameter congurations of your choice, comment on the sensitivity if you decide to examine a selected hyperparameter more systematically).

Mention compute time aspects, convergence, reconstruction and classication

errors across layers. Briey discuss/interpret hidden layer representations and

features.

Good luck!

References

Hinton, G. E. (2012). A practical guide to training restricted Boltzmann machines. In Neural networks, Tricks of the trade (pp. 599-619). Springer, Berlin,

Heidelberg.

Yamashita, T., Tanaka, M., Yoshida, E., Yamauchi, Y., and Fujiyoshii, H.

(2014). To be Bernoulli or to be Gaussian, for a restricted Boltzmann machine. In the 22nd International Conference on Pattern Recognition (ICPR)

(pp. 1520-1525).