Description
1) Understanding of feedforward-designed convolutional neural networks (FF-CNNs) (15%)
An FF-CNN consists of two modules in cascade: 1) the construction of conv layers using the Saab
(Subspace approximation with adjusted bias) transforms: and 2) the construction of fully-connected (FC)
layers using the multi-stage linear least squared regressor (LSR).
• Summarize FF-CNNs with a flow chart and explain it in your own words.
• Explain the similarities and differences between FF-CNNs and backpropagation-designed CNNs
(BP-CNNs).
Do not copy any sentences from [1] or other papers directly. It is plagiarism. The scores will depend on
your degree of understanding.
2) Image reconstructions from Saab coefficients (35%)
Apply Saab transforms to images in the MNIST dataset [2].
• Compute the Saab coefficients (you can use online source codes [3] or implement by yourself) of
four handwritten digits images as shown in Figure 1 and implement the reconstruction algorithm
(write your own codes) to transform the Saab coefficients back to images.
• To evaluate the reconstruction results, you need to show the reconstructed images and compute
PSNR scores between original images and reconstructed images.
Architecture setting:
In this problem, you should use two stage Saab transforms where the spatial size of the transform
kernels is 4×4. The stride of each stage is 4 (non-overlapping). Thus, at the output, the dimension of
your Saab coefficients of an image should be 2x2xN, where N is the number of transform kernel in the
second stage. You need to evaluate on four different settings (different kernel numbers of each stage)
and discuss your results.
Figure 1
3) Handwritten digits recognition using ensembles of feedforward design (50%)
In this problem, you will apply an FF-CNN to solve handwritten digits recognition. Train an FF-CNN
using the 60,000 training images from the MNIST dataset. Adopt the LeNet-5-like architecture where
the filter numbers of the first- and the second-conv layers and the first- and the second-FC layers are 6,
16, 120 and 80, respectively. The spatial size of the transform kernels is 5×5 and the stride is 1 for each
conv layer. To reduce the spatial dimension, max-pooling layer is adopted.
• Report the training and testing classification accuracy for individual FF-CNN on the MNIST
dataset.
• One way to improve the performance is building the ensemble systems of FF-CNNs. Train ten
different FF-CNNs and ensemble their results following the method in [4]. Diversity is the key to
have successful ensembles, and paper [4] introduces three strategies to increase diversities in an
ensemble of FF-CNNs which you can refer to. Explain and justify your strategies to generate
various FF-CNNs in an ensemble and report the training and testing classification accuracy of
your ensemble system.
• Error analysis: Please compare classification error cases arising from BP-CNNs (use best result
in your HW#5) and FF-CNNs. What percentages of errors are the same? What percentages are
different? Please give explanations to your observations. Also, please propose ideas to improve
BP-CNNs, FF-CNNs or both and justify your proposal. There is no need to implement your
proposed ideas.
References
[1][ Kuo, C. C. J., Zhang, M., Li, S., Duan, J., & Chen, Y. (2019). Interpretable convolutional neural networks via
feedforward design. Journal of Visual Communication and Image Representation.
[2][MNIST] https://yann.lecun.com/exdb/mnist/
[3] https://github.com/davidsonic/Interpretable_CNN
[4] Chen, Y., Yang, Y., Wang, W., & Kuo, C. C. J. (2019). Ensembles of feedforward-designed convolutional
neural networks. arXiv preprint arXiv:1901.02154.