EECE5644 – Exam 2 solution

$30.00

Original Work ?
Category: You will Instantly receive a download link for .ZIP solution file upon Payment

Description

5/5 - (1 vote)

Question 1 (50%)
You will train a multilayer perceptron neural network model using maximum likelihood parameter estimation procedure to approximate the maximum a posteriori (MAP) classification rule,
which is the therotically optimal solution for minimizing probablity of error.
Part 1: Select data distribution.
• Select a data distribution for x ∈ R
3
that is a mixture of 4 Gaussians. Each Gaussian component represents a class label.
• Select distinct (non-uniform) class priors (i.e. distinct Gaussian component weights in the
convex linear combination).
• Select distinct mean vectors and covariance matrices for each Gaussian class-conditional
probability distribution.
• Do not choose covariance matrices to be diagonal; and select each coariance matrix to have
distinct eigenvalues (so that your Gaussians are tilted with respect to coordinate axes and
have elongated ellipsoid equilevel contours).
• Choose the mean vectors and covariance matrices of the Gaussians to allow for a moderate
level of overlap between class distributions so that the theoretical minimum probability of
error is neither too small (e.g. < 3%) nor too large (e.g. > 25%). This will make the problem
more interesting.
• Clearly specify your data distribution and demonstrate it visually with a scatter plot using
an appropriate number of samples drawn in an iid fashion from your distribution (e.g., draw
1000 samples and plot them in a 3-dimensional scatter plot with class labels color coded).
Part 2: Determine and evaluate the theoretically optimal MAP classifier.
• For your selected data distribution, determine and specify the MAP-classification rule.
• Generate (at least) 10000 iid (test) samples (for which you know the class labels of course
since you generate them); apply the MAP classifier to these samples to get MAP-decision
labels.
• Counting the number of misclassified samples and dividing by the total number of samples,
estimate the theoretical minimum probability of error achievable for your data distribution.
• Save this test dataset for use in the next step.
• Present appropriate math, and visual/numerical results to convince the reader that you have
implemented the theoretical MAP classifier appropriately, and to approximate the theoretically achieveable smallest error probability.
1
Part 3: Train and evaluate a classifier based on neural network based approximations of
the class label posteriors; for a given input data vector class label decision will be made by
selecting the neural network output index that has the largest value.
• Generate three separate training datasets from your data distribution; these datasets will
respectively have 100, 1000, and 10000 iid samples and their true labels.
• For each dataset, you will train a multilayer perceptron to approximate class posterior probabilities given the data vector, using the maximum likelihood parameter estimation principle. These neural networks will each have a single hidden layer (i.e. two layers of weight
matrices) with sigmoid outputs (choose your favorite from a suitable list, including logistic, hyperbolic tangent, and softplus functions). For the output layer nonlinearity, use the
normalized exponential function (in order to ensure that your vector-valued output approximating class posteriors for each class label are in the probability simplex). Determine and
present the necessary mathematical expression of the optimization problem that needs to be
solved to achieve this goal. Implement appropriately using your preferred software package.
Describe your implementation and explain how it matches the mathematical description of
the problem you specified.
• For each multilayer perceptron neural network you train on a given dataset, determine the
most appropriate number of perceptrons (units/nodes) in the hidden layer, use 10-fold crossvalidation using probability of correct decisions as your performance measure (since it is 1
minus probability or error, and is consistent with our overarching objective to design a classifier with the smallest possible error probability). Present appropriate math, descriptions,
visual and numerical results to convince the reader that you have done model order selection
appropriately using cross-validation.*
• Once you determine the best number of perceptrons that the training data justifies, train
your final neural network with appropriate model order to maximize data likelihood using
all training data in the given set.
• Apply your trained neural network classifiers to the test dataset that you generated and used
in the previous item where theoretically optimal MAP classifier was analyzed numerically.
Report the test dataset probability error estimates for your neural network classifiers trained
with different training dataset sizes. Discuss the effect of training set size on test dataset
performance for these neural network models.
* Note that if we had not commited to a minimum-probability-of-error classifier design upfront,
we could use log-likelihood of validation data as an appropriate model order selection objective in
the cross-validation process.
Question 2 (50%)
Generate two-dimensional x = [x1, x2]
T
samples with the attached Matlab script (the data is
generated through iid sampling from a mixture of three Gaussians). Specifically generate 1000
samples for training and 10000 samples for testing.
Train and test a single hidden layer MLP function approximator to estimate the value of x2
from the value of x1 by minimizing the mean-squared-error (MSE) on the training set.
Using 10-fold cross-validation to select between logistic (sigmoid) and softplus (SmoothReLu)
nonlinearities for the perceptrons in the hidden layer, as well as the number of perceptrons. Leave
2
the output layer linear (no nonlinearity). Once the best model architecture is identified using crossvalidation, train the selected model with the entire training set. Apply the trained MLP to the test
dataset. Estimate the test performance.
Explain your work clearly to inform the readers of all relevant details for reproducibility of
your results, and to convince them that you have done everything correctly/properly, including a
report of the following: (1) visual and numerical demonstrations of the cross-validation process
indicating how the model selection was carried out; (2) visual and numerical demonstration of the
performance of the trained model on the test data set.
Hint: logistic(z) = 1/(1+e
−z
) & so ft plus(z) = ln(1+e
z
)
3