Description

5/5 - (1 vote)

Question 1 (20%)
The probability density function (pdf) for a 2-dimensional real-valued random vector X is as
follows: p(x) = P(L = 0)p(x|L = 0) + P(L = 1)p(x|L = 1). Here L is the true class label that
indicates which class-label-conditioned pdf generates the data.
The class priors are P(L = 0) = 0.6 and P(L = 1) = 0.4. The class class-conditional pdfs
are p(x|L = 0) = w01g(x|m01,C01) + w02g(x|m02,C02) and p(x|L = 1) = w11g(x|m11,C11) +
w12g(x|m12,C12), where g(x|m,C) is a multivariate Gaussian probability density function with
mean vector m and covariance matrix C. The parameters of the class-conditional Gaussian pdfs
are: wi1 = wi2 = 1/2 for i ∈ {1,2}, and
m01 = [−0.9
−1.1
] m02 = [ 0.8
0.75 ] m11 = [−1.1
0.9
] m12 = [ 0.9
−0.75 ] Ci j = [ 0.75 0
0 1.25 ] for all {i j} pairs.
For numerical results requested below, generate the following independent datasets each consisting of iid samples from the specified data distribution, and in each dataset make sure to include
the true class label for each sample.
• D
20
train consists of 20 samples and their labels for training;
• D
200
train consists of 200 samples and their labels for training;
• D
2000
train consists of 2000 samples and their labels for training;
• D
10K
validate consists of 10000 samples and their labels for validation;
Part 1: (6%) Determine the theoretically optimal classifier that achieves minimum probability
of error using the knowledge of the true pdf. Specify the classifier mathematically and implement
it; then apply it to all samples in D
10K
validate. From the decision results and true labels for this
validation set, estimate and plot the ROC curve for a corresponding discriminant score for this
classifier, and on the ROC curve indicate, with a special marker, the location of the min-P(error)
classifier. Also report an estimate of the min-P(error) achievable, based on counts of decisiontruth label pairs on D
10K
validate. Optional: As supplementary visualization, generate a plot of the
decision boundary of this classification rule overlaid on the validation dataset. This establishes an
aspirational performance level on this data for the following approximations.
Part 2: (12%) (a) Using the maximum likelihood parameter estimation technique train three
separate logistic-linear-function-based approximations of class label posterior functions given a
sample. For each approximation use one of the three training datasets D
20
train, D
200
train, D
2000
train. When
optimizing the parameters, specify the optimization problem as minimization of the negative-loglikelihood of the training dataset, and use your favorite numerical optimization approach, such as
gradient descent or Matlab’s fminsearch. Determine how to use these class-label-posterior approximations to classify a sample in order to approximate the minimum-P(error) classification rule;
apply these three approximations of the class label posterior function on samples in D
10K
validate, and
estimate the probability of error that these three classification rules will attain (using counts of
decisions on the validation set). Optional: As supplementary visualization, generate plots of the
decision boundaries of these trained classifiers superimposed on their respective training datasets
and the validation dataset. (b) Repeat the process described in Part (2a) using a logistic-quadraticfunction-based approximation of class label posterior functions given a sample.
Discussion: (2%) How does the performance of your classifiers trained in this part compare to
each other considering differences in number of training samples and function form? How do they
compare to the theoretically optimal classifier from Part 1? Briefly discuss results and insights.
1
Note 1: With x representing the input sample vector and w denoting the model parameter vector, logistic-linear-function refers to h(x,w) = 1/(1+e
−w
T
z(x)
), where z(x) = [1,x
T
]
T
; and logisticquadratic-function refers to h(x,w) = 1/(1+e
−w
T
z(x)
), where z(x) = [1, x1, x2, x
2
1
, x1x2, x
2
2
]
T
.
Question 2 (20%)
Assume that scalar-real y and two-dimensional real vector x are related to each other according
to y = c(x,w) +v, where c(.,w) is a cubic polynomial in x with coefficients w and v is a random
Gaussian random scalar with mean zero and σ
2
-variance.
Given a dataset D = (x1, y1),…,(xN, yN) with N samples of (x, y) pairs, with the assumption
that these samples are independent and identically distributed according to the model, derive two
estimators for w using maximum-likelihood (ML) and maximum-a-posteriori (MAP) parameter
estimation approaches as a function of these data samples. For the MAP estimator, assume that w
has a zero-mean Gaussian prior with covariance matrix γI.
Having derived the estimator expressions, implement them in code and apply to the dataset
generated by the attached Matlab script. Using the training dataset, obtain the ML estimator and
the MAP estimator for a variety of γ values ranging from 10−m to 10n
. Evaluate each trained
model by calculating the average-squared error between the y values in the validation samples and
model estimates of these using c(.,wtrained). How does your MAP-trained model perform on the
validation set as γ is varied? How is the MAP estimate related to the ML estimate? Describe your
experiments, visualize and quantify your analyses (e.g. average squared error on validation dataset
as a function of hyperparameter γ) with data from these experiments.
Note: Point split will be 20% for ML and 20% for MAP estimator results and discussion.
Question 3 (20%)
A vehicle at true position [xT , yT ]
T
in 2-dimensional space is to be localized using distance
(range) measurements to K reference (landmark) coordinates {[x1, y1]
T
,…,[xi
, yi
]
T
,…,[xK, yK]
T}.
These range measurements are ri = dTi +ni for i ∈ {1,…,K}, where dTi = ∥[xT , yT ]
T −[xi
, yi
]
T∥
is the true distance between the vehicle and the i
th reference point, and ni
is a zero mean Gaussian distributed measurement noise with known variance σ
2
i
. The noise in each measurement is
independent from the others.
Assume that we have the following prior knowledge regarding the position of the vehicle:
p

x
y
!
= (2πσxσy)
−1
e
−
1
2
h
x yi
”
σ
2
x 0
0 σ
2
y
#−1″
x
y
#
(1)
where [x, y]
T
indicates a candidate position under consideration.
Express the optimization problem that needs to be solved to determine the MAP estimate of
the vehicle position. Simplify the objective function so that the exponentials and additive/multiplicative
terms that do not impact the determination of the MAP estimate [xMAP, yMAP]
T
are removed appropriately from the objective function for computational savings when evaluating the objective.
Implement the following as computer code: Set the true vehicle location to be inside the
circle with unit radious centered at the origin. For each K ∈ {1,2,3,4} repeat the following.
Place evenly spaced K landmarks on a circle with unit radius centered at the origin. Set measurement noise standard deviation to 0.3 for all range measurements. Generate K range measure2
ments according to the model specified above (if a range measurement turns out to be negative,
reject it and resample; all range measurements need to be nonnegative).
Plot the equilevel contours of the MAP estimation objective for the range of horizontal and
vertical coordinates from −2 to 2; superimpose the true location of the vehicle on these equilevel
contours (e.g. use a + mark), as well as the landmark locations (e.g. use a o mark for each one).
Provide plots of the MAP objective function contours for each value of K. When preparing
your final contour plots for different K values, make sure to plot contours at the same function
value across each of the different contour plots for easy visual comparison of the MAP objective
landscapes. Suggestion: For values of σx and σy, you could use values around 0.25 and perhaps
make them equal to each other. Note that your choice of these indicates how confident the prior is
about the origin as the location.
Supplement your plots with a brief description of how your code works. Comment on the
behavior of the MAP estimate of position (visually assessed from the contour plots; roughly center
of the innermost contour) relative to the true position. Does the MAP estimate get closer to the
true position as K increases? Doe is get more certain? Explain how your contours justify your
conclusions.
Note: The additive Gaussian distributed noise used in this question is likely not appropriate for a proper distance sensor, since it could lead to negative measurements. However, in this
question, we will ignore this issue and proceed with this noise model for illustration. In practice,
a multiplicative log-normal distributed noise may be more appropriate than an additive normal
distributed noise depending on the measurement mechanism.
Question 4 (20%)
Problem 2.13 from Duda-Hart-Stork textbook:
3
Question 5 (20%)
Let Z be drawn from a categorical distribution (takes discrete values) with K possible outcomes/states and parameter θ, represented by Cat(Θ). Describe the value/state using a 1-of-K
scheme for z = [z1,…,zK]
T where zk = 1 if variable is in state k and zk = 0 otherwise. Let the
parameter vector for the pdf be Θ = [θ1,…,θK]
T
, where P(zk = 1) = θk
, for k ∈ {1,…,K}.
Given D{z1,…, zN} with iid samples zn ∼ Cat(Θ) for n ∈ {1,…,N}:
• What is the ML estimator for Θ?
• Assuming that the prior p(Θ) for the parameters is a Dirichlet distribution with hyperparameter α, what is the MAP estimator for Θ?
Hint: The Dirichlet distribution with parameter α is
p(Θ|α) = 1
B(α)
K
∏
k=1
θ
αk−1
k where the normalization constant is B(α) = ∏
K
k=1 Γ(αk)
Γ(∑
K
k=1αk)
4

EECE5644 Assignment 2 solved

Download Details:

Description

EECE5644 Assignment 2 solved

Download Details:

Description

Related products

EECE5644 Assignment 1 solved

Solved EECE 4542: Advanced Engineering Algorithms Project #1

Solved EECE 4542: Advanced Engineering Algorithms Project #4