COMP8043 Assignment 2: Supervised Learning solution

$24.99

Original Work ?
Category: You will Instantly receive a download link for .ZIP solution file upon Payment

Description

5/5 - (4 votes)

Objective
The Excel file “product_images.csv” on Canvas contains processed product images of
sneakers and ankle boots from Zalando.com. Every row consists of a label, being either 0 for
a sneaker or 1 for an ankle boot, and 28×28 8-bit grayscale pixel values of the product image
(see Figure 1).
The goal of this assignment is to evaluate and optimise the performance of different
classifiers for their suitability to classify this dataset.
Machine Learning
Assignment 2: Supervised Learning
Figure 1 28×28 feature vector for an example sneaker (left) and an example ankle boot
(right).
Task 1 (pre-processing and visualisation, 8 points)
Load the product images [1 point] and separate the labels [1 point] from the feature vectors
[1 point]. How many samples are images of sneakers [1 point], how many samples are
images of ankle boots [1 point]? Display at least one image for each class [2 points].
Parameterise the number of samples to use from the dataset in order to be able to control
the runtime of the algorithm evaluation [1 point]. Start developing using a small number of
samples, and increase for the final evaluation.
Task 2 (Perceptron, 11 points)
Create a k-fold cross validation procedure to split the data into training and evaluation
subsets [1 point]. Train a perceptron classifier on the training subsets [1 point] and predict
labels for the evaluation subsets [1 point]. Measure the processing time required for
training [1 point], the processing time required for prediction [1 point], and determine the
accuracy score of the classification [1 point] and the confusion matrix [1 point] for each split.
Calculate the minimum, the maximum, and the average of the training time per training
sample [1 point], the prediction time per evaluation sample [1 point] and the prediction
accuracy [1 point]. Use a sufficient number of splits and vary the number of samples to
observe the effect on runtime and accuracy [1 point].
Task 3 (Support Vector Machine, 14 points)
Create a k-fold cross validation procedure to split the data into training and evaluation
subsets [1 point]. Train a support vector machine classifier on the training subsets. Try a
linear kernel [1 point] and a radial basis function kernel for different choices of the
parameter 𝛾 [2 points]. Predict the labels for the evaluation subsets [1 point]. Measure the
time required for training [1 point], the time required for prediction [1 point], and
determine the accuracy score of the classification [1 point] and the confusion matrix [1
point] for each split. Calculate the minimum, the maximum, and the average of the training
time per training sample [1 point], the prediction time per evaluation sample [1 point] and
the prediction accuracy [1 point]. Determine a good value for 𝛾 based on the mean
accuracies you calculated [1 point]. Use a sufficient number of splits and vary the number of
samples to observe the effect on runtime and accuracy [1 point].
Task 4 (comparison, 2 points)
Compare the runtime and accuracy of the classifiers [1 point]. Which one would you choose
and why? [1 point].