Description
1 Classification with Gaussian Models
In the first part of the lab, we use linear discriminant analysis (LDA) and quadratic discriminant analysis
(QDA) on the 2D data in ldaqda.zip, and visualize the classification results for class 1 and class 2 based on
the features in the data.
Suppose that the dataset contains N samples. Let xn = [hn, wn] be the feature vector, where hn denotes
feature 1 and wn denotes feature 2 of the n-th data point. Let yn denote the class label where yn = 1 or
yn = 2. We model the class prior as p(yn = 1) = π and p(yn = 2) = 1 − π. For this problem, let π = 0.5.
For the class conditional distributions, let µ1 be the mean of xn if class label yn = 1, and let µ2 be the
mean of xn if class label yn = 2. For LDA, a common covariance matrix is shared by both classes, which is
denoted by Σ; for QDA, different covariance matrices are used for class 1 and class 2, which are denoted by
Σ1 and Σ2, respectively.
Download ldaqda.zip from Quercus and unzip the file. The dataset for training is in file trainData.txt, whereas
the dataset for testing is in file testData.txt. Each file uses the same format to represent the data: the first
column corresponds to the class labels, the second column corresponds to feature 1 values, and the third
column corresponds to feature 2 values.
Please answer the questions below and complete the two functions in ldaqda.py. File util.py contains a few
functions/classes that will be useful in writing the code.
Questions
1. Training and visualization. We estimate the parameters in LDA and QDA from the training data in
trainData.txt and visualize the LDA/QDA model.
(a) Please write down the maximum likelihood estimates of the parameters µ1
, µ2
, Σ, Σ1, and Σ2 as
functions of the training data {xn, yn}, n = 1, 2, . . . , N. The indicator function I(·) may be useful
in your expressions.
(b) Once the above parameters are obtained, you can design a classifier to make a decision on the
class label y of the new data x. The decision boundary can be written as a linear equation of x
in the case of LDA, and a quadratic equation of x in the case of QDA. Please write down the
expressions of these two boundaries.
(c) Complete function discrimAnalysis in file ldaqda.py to visualize LDA and QDA. Please plot one
figure for LDA and one figure for QDA. In both plots, the horizontal axis is Feature 1 with
range [−4, 6] and the vertical axis is Feature 2 with range [−5, 5]. Each figure should contain:
1) N colored data points {xn, n = 1, 2, . . . , N} with the color indicating the corresponding class
labels (e.g., blue represents class 1 and red represents class 2); 2) the contours of the the conditional Gaussian distribution for each class (To create a contour plot, you need first build a
two-dimensional grid for the range [−4, 6] × [−5, 5] by using the function np.meshgrid. You then
compute the conditional Gaussian density at each point in the grid for each class. Finally use the
function plt.contour, which takes the two-dimensional grid and the conditional Gaussian density
on the grid as inputs to automatically produce the contours.); 3) the decision boundary, which
can also be created by using plt.contour with appropriate contour level.
1
2. Testing. We test the obtained LDA/QDA model on the testing data in testData.txt. Complete function
misRate in file ldaqda.py to compute the misclassification rates for LDA and QDA, defined as the total
percentage of the misclassified samples (both classes) over all samples.
2 Bayesian Linear Regression
In this part of the lab, we use Bayesian regression to fit a linear model. Consider a linear model of the form
z = a1x + a0 + w, (1)
where x is the scaler input variable, and a = (a0, a1)
T
is the vector-valued parameter with unknown entries
a0, a1, and w is the additive Gaussian noise:
w ∼ N (0, σ2
), (2)
where σ
2
is a known parameter.
Suppose that we have access to a training dataset containing N samples {x1, z1}, {x2, z2}, . . . , {xN , zN }. We
aim to estimate the parameter a by finding its posterior distribution. When the training finishes, we make
predictions based on new inputs. We consider a Bayesian approach, which models the parameter a as a zero
mean isotropic Gaussian random vector whose probability distribution is expressed as
p (a) = N
0
0
,
β 0
0 β
, (3)
where β is a known hyperparameter.
Download reg.zip from Quercus and unzip the file. First, open the file generate data.py and replace the
student numbers in the code with your actual student numbers. Then, run generate data.py to create your
personalized training data (training.txt) and make sure to record the ground truth values for a0 and a1 that
are printed after running generate data.py.
File training.txt contains the training data: the first column is the inputs; the second column is the targets.
The training data is generated from z = a1x + a0 + w where the actual values of a1 and a0 are available by
running generate data.py. Please answer the questions below and complete regression.py. File util.py contains
a few useful functions.
Questions
1. Express the posterior distribution p(a|z1, . . . , zN ; x1, . . . , xN ) using σ
2
, β, x1, z1, x2, z2, . . . , xN , zN .
This notation should be read as “the conditional distribution of a given z1, . . . , zN under the parameters x1, . . . , xN ”. In this problem z1, . . . , zN and a are random variables while x1, . . . xN are
unknown constant parameters.
2. Let σ
2 = 0.1 and β = 1. Based on the posterior distribution obtained in the last question, draw four
contour plots corresponding to p(a), p(a|z1; x1), p(a|z1, . . . , z5; x1 . . . x5), and p(a|z1, . . . , z100; x1 . . . x100).
In all contour plots, the x-axis represents a0, and the y-axis represents a1. The range is set as
[−1, 1] × [−1, 1]. In each figure, also draw the true value of a.
3. Suppose that there is a new input x, for which we want to predict the target value z. Write down the
distribution of the prediction z, i.e., p(z|z1, . . . , zN ; x, x1, . . . xN ).
4. Let σ
2 = 0.1 and β = 1. Suppose that the set of the new inputs is {−4, −3.8, −3.6, . . . , 0, . . . , 3.6, 3.8, 4}.
Plot three figures corresponding to the following three cases:
2
(a) The predictions are based on one training sample, i.e., based on p(z|z1; x, x1,).
(b) The predictions are based on 5 training samples, i.e., based on p(z|z1, . . . , z5; x, x1, . . . , x5).
(c) The predictions are based on 100 training samples, i.e., based on p(z|z1, . . . , z100; x, x1, . . . , x100).
In all figures, the x-axis is the input, the y-axis is the target, and the range is set as [−4, 4] × [−4, 4].
Each figure should contain three components: 1) the new inputs and the predicted targets; 2) a vertical
interval at each predicted target, indicating the range within one standard deviation; 3) the training
sample(s) that are used for the prediction. Use plt.errorbar for 1) and 2); use plt.scatter for 3).


