EECE5644 – Take Home Exam 4 solution

$29.99

Original Work ?
Category: You will Instantly receive a download link for .ZIP solution file upon Payment

Description

5/5 - (3 votes)

Question 1 (30%)
The data generation script for this question is called exam4q1 generateData.m. Generate twodimensional x = [x1, x2]
T
samples with this Matlab script; specifically, 1000 samples for training
and 10000 samples for testing.
Train and test a single hidden layer MLP function approximator to estimate the value of X2
from the value of X1 by minimizing the mean-squared-error (MSE) on the training set. (The first
coordinate of each data vectors will go into the MLP, and the output will try to approximate the
second coordinate of the data vectors.)
Use a softplus (SmoothReLu) activation function as the nonlinearity for the perceptrons in
the hidden layer. Useing 10-fold cross validation, select the the best number of perceptrons that
your training set can justify using. Leave the output layer of the MLP as a linear unit linear
(no nonlinearity). Once the best model structure is identified using cross-validation, train the
selected model with the entire training set. Apply the trained MLP to the test set. Estimate the test
performance.
You may use existing software packages for all aspects of this solution. Make sure to clearly
demonstrate that you are using the packages properly.
Hint: logistic(z) = 1/(1+e
−z
) & so ft plus(z) = ln(1+e
z
)
Note: The theoretical minimum-MSE estimator is the conditional expectation of X2 given X1,
which can be analytically derived using the joint pdf, if it had been known, but in practice we do
not know the true pdf, so in this exercise we try to approximate the theoretically optimal solution
with a neural network model.
Question 2 (35%)
For this question use the same multiring data generation script from the third assignment. Generate a two-class training set with 1000 and testing set with 10000 samples. The class priors should
be equal. Train and evaluate a support vector machine classifier with a Gaussian kernel (radialbasis function (RBF) kernel) on these datasets.
Specifically, us a spherically symmetric Gaussian/RBF kernel. Using 10-fold cross-validation,
select the best box constraint hyperparameter C and the Gaussian kernel width parameter σ (notation based on previously covered math in class).
Train a final SVM using the best combination of hyperparameters with the entire training set.
Classify the testing dataset samples with this trained SVM to assess performance.
You may use existing software packages for all aspects of this solution. Make sure to clearly
demonstrate that you are using the packages properly.
Question 3 (35%)
In this question, you will use GMM-based clustering to segment the color images 3096 color.jpg
and 42049 color.jpg from the Berkeley Image Segmentation Dataset. We will refer to these images
as the airplane and bird images, respectively.
As preprocessing, for each pixel, generate a 5-dimensional feature vector as follows: (1) append row index, column index, red value, green value, blue value for each pixel into a raw feature
vector; (2) normalize each feature entry individually to the interval [0,1], so that all of the feature vectors representing every pixel in an image fit into the 5-dimensional unit-hypercube. All
segmentation algorithms should operate on these normalized feature vectors.
1
For each image do the following: (1) Using maximum likelihood parameter estimation, fit a
GMM with 2-components, use this GMM to segment the image into two parts; (2) Using 10-fold
cross-validation, and maximum everage validation-log-likelihood as the objective, identify the best
number of clusters, then fit a new GMM with this best number of components and use this GMM
to segment the image into as many parts as there are number of Gaussians.
For GMM-based clustering, use the GMM components as class/cluster-conditional pdfs and
assign cluster labels using the MAP-classification rule.
2