Sale!

Solved Homework 3 AMATH 482, Winter 2025 Problem Description: MNIST Digit Classification Your goal in this assignment is to train classifiers

$50.00 $30.00

Original Work ?

Download Details:

  • Name: Report-3-n3wxdh.zip
  • Type: zip
  • Size: 1.75 MB

Category: Tags: , You will Instantly receive a download link upon Payment||Click Original Work Button for Custom work

Description

5/5 - (1 vote)

Winter 2025

Problem Description: MNIST Digit Classification

Your goal in this assignment is to train classifiers to distinguish images of handwritten digits from the famous
MNIST data set. This is a classic problem in machine learning and often times one of the first benchmarks
one tries new algorithms on. The data set is split into training and test sets. You will train your classifiers
using the training set while the test set is only used for validation/evaluation of your classifiers. The data
(both train and test sets) is from Yann Lecun http://yann.lecun.com/ and placed in google drive for you
to download. Alternatively you can also use sklearn.datasets to load the dataset. For Python, parse it into
matrices using the HW3Helper notebook provided. For Matlab there are codes available online that will help
you to do this (e.g. https://github.com/sunsided/mnist-matlab).
Figure 1 First 64 Digits in MNIST Dataset
Some comments and hints
Here are some useful comments and facts to guide you along the way.
1. In this assignment, it is advised to use sklearn for many of the analyses that you would like to perform
since MNIST dataset is larger than the toy datasets that we previously considered. In particular,
you will find that direct application of SVD on the training set would be time-consuming while
PCA implementation in sklearn.decomposition is more optimized. Read the instructions for each
function you will be using in https://scikit-learn.org/stable/user_guide.html and check that
your dimensions and formatting are compatible with sklearn convention.
2. You are welcome to use the code samples from class for various classifiers that we applied to the IRIS
dataset. (These will be presented in class in the coming week)
Tasks
Below is a list of tasks to complete in this assignment and discuss in your report.
1. You will need to reshape each image into a vector and stack the vectors into matrices 𝑋train and 𝑋test
respectively. Perform PCA analysis of the digit images in the train set. Plot the first 16 PC modes as
28 × 28 images (see an example on the previous page of how multiple images can be displayed in a
grid).
2. Inspect the cumulative energy of the singular values and determine 𝑘: the number of PC modes needed
to approximate 85% of the energy. You may also want to inspect several approximated digit images
reconstructed from 𝑘 truncated PC modes and plot them to make sure that the image reconstruction
using truncated modes is reasonable.
3. Write a function that selects a subset of particular digits (all samples of them) from 𝑋train, 𝑦train, 𝑋text
and 𝑦test and returns the subset as new matrices 𝑋subtrain, 𝑦subtrain, 𝑋subtext and 𝑦subtest.
4. Select the digits 1,8 using step 3, project the data onto 𝑘-PC modes computed in steps 1-2, and apply
the Ridge classifier (linear) to distinguish between these two digits. Perform cross-validation and
testing and discuss your results.
5. Repeat the same classification procedure for pairs of digits 3,8 and 2,7. Report your results and
compare them with the results in step 4. If there is any difference can you explain it?
6. Use all the digits and perform multi-class classification with Ridge and KNN classifiers. Report your
results and discuss how they compare between the methods. Which method performs the best?
7. Bonus (+2 points): Implement an alternative classifier, that we did not cover in class, (e.g. SVM), and
compare its results with the classifiers in the previous step.