Description
Question 1 (60%)
Train and test Support Vector Machine (SVM) and Multi-layer Perceptron (MLP) classifiers
that aim for minimum probability of classification error (i.e. we are using 0-1 loss; all error instances are equally bad). You may use a trusted implementation of training, validation, and testing
in your choice of programming language. The SVM will use a Gaussian (sometimes called radialbasis) kernel. The MLP will be a single-hidden layer model (i.e., two fully connected layers) with
your choice of activation functions for all perceptrons in the first layer and a softmax function for
the last layer. Use appropriate K-fold cross-validation methodologies to select relevant hyperparameters (kernel width and box constraint for the SVM, number of perceptrons for the MLP) in
both models in the training phase, and apply your final trained model with optimized hyperparameters on the test set for performance assessment (probability of error estimate with test data).
Generate 1000 independent and identically distributed (iid) samples for training and 10000 iid
samples for testing for class l ∈ {−1,+1} as follows:
x = rl
cos(θ)
sin(θ)
+n (1)
where θ ∼ Uni f orm[−π,π] and n ∼ N(0,σ
2
I). Use r−1 = 2,r+1 = 4,σ = 1.
Report (1) visual and numerical demonstrations of the K-fold cross-validation process indicating how the hyperparameters for SVM and MLP classifiers are set; (2) visual and numerical
demonstrations of the performance of your SVM and MLP classifiers on the test data set.
Score split: 30% for SVM and 30% for MLP.
Hint: For hyperparameter selection, you may show the performance estimates for various
choices and indicate where the best result is achieved. For test performance, you may show the
data and classification boundary superimposed, along with an estimated probability of error from
the samples. Modify and supplement these ideas as you see appropriate.
Note that the two class sample sets will be highly overlapping two concentric disks, and due
to angular symmetry, we anticipate the best classification boundary to be a circle between the
two disks. Your SVM and MLP models will try to approximate it. Since the optimal boundary is
expected to be a quadratic curve (circle), quadratic polynomial activation functions in the hidden
layer of the MLP may be considered to be an appropriate choice. (OPTIONAL: If you have time,
not needed for the assignment, experiment with different activation function selections to see the
effect of this choice.
Question 2 (40%)
In this question, you will use GMM-based clustering to segment a color image. Pick your color
image from this dataset: https://www2.eecs.berkeley.edu/Research/Projects/
CS/vision/grouping/segbench/BSDS300/html/dataset/images.html.
1
As preprocessing, for each pixel, generate a 5-dimensional feature vector as follows: (1) append row index, column index, red value, green value, blue value for each pixel into a raw feature
vector; (2) normalize each feature entry individually to the interval [0,1], so that all of the feature
vectors representing every pixel in an image fit into the 5-dimensional unit-hypercube.
Fit a Gaussian Mixture Model to these normalized feature vectors representing the pixels of the
image. To fit the GMM, use maximum likelihood parameter estimation and K-fold cross-validation
(with maximum average validation-log-likelihood as the objective) for model order selection.
Once you have identified the best GMM for your feature ectors, assign the most likely component label to each pixel by evaluating component label posterior probabilities for each feature
vector according to your GMM. Present the original image and your GMM-based segmentation labels assigned to each pixel side by side for easy visual assessment of your segmentation outcome.
If using grayscale values as segment/component labels, please uniformly distribute them between
min/max grayscale values to have good contrast in the label image.
Hint: If the image has too many pixels for your available computational power, you may downsample the image to reduce overall computational needs).



