Description
1) This question pertains to the variables X and y_int in the midterm data file data.mat. Apologies to those of you who have already done this or similar analysis during the midterm. Consider the entire set of N=10000 input datapoints (rows of X) as well as the two classes of datapoints (rows of X corresponding to the different values of y_int) hereby defined X1 and X2. Use principal component analysis for eigenvector decomposition of each of these three sets. Create three scatterplots, each in 3D, and each of all N datapoints, color coded by class. The first scatter-plot will be along the coordinate systems defined by the top principal components of X. The second and third scatter-plots will be along the coordinate systems defined by the top principal components of X1 and X2. Submit three MatLab figure files for the three plots (scatterX.fig, scatterX1.fig, scatterX2. [20pt] 2) Consider the exponential distribution defined in Assignment 7 Problem 2. A mixture of exponentials is a random variable whose distribution has a parameter O chosen at random among {Oi} , i=1,..,K with respective prior probabilities {Si} . This would model, e.g., your pile of pistachios being selected at random from among K varieties, each with its own rate of spontaneous combustion. Write a function SimMixExps to simulate data from this distribution per the attached prototype. Assume, w.l.o.g. Oi are in increasing order. [20pt] 3) Develop EM for inferring {Oi} and {Si} from data. a) Define the hidden variables, mixture proportions, responsibilities. Write down the log likelihood, and the expected log likelihood. Develop the update equations for each E-step and M-step. [15 pt] b) Implement (a) in EMExps per the attached prototype [15pt] c) Choose particular {Oi} and {Si} values, for which you will benchmark the performance of EM as a (plotted) function of N. Measure performance in two ways: root-sum-of-squared-differences for {Oi} and root-sum-of-squared-differences {Si}. Choose a range for N that would take you from poor to great performance. This will depend on the values you choose. Submit the MatLab figures for both (PlotRMSDLambda.fig, PlotRMSDPi.fig ) and the code to do this: a script MakePlotsRMSD.m [15pt]