Description

5/5 - (1 vote)

Question 1
Show that maximization of the class separation criterion given by m2 − m1 =
wT(m2 − m1) with respect to w, using a Lagrange multiplier to enforce the constraint
wTw = 1, leads to the result that w ∝ (m2 − m1).
Question 2
Show that the Fisher criterion
J(w) = (m2 − m1)
2
s
2
1 + s
2
2
can be written in the form
J(w) = wTSBw
wTSWw
.
Hint.
y = wT
x, mk = wTmk, s
2
k = ∑
n∈Ck
(yn − mk)
2
Question 3
Consider a generative classification model for K classes defined by prior class probabilities p(Ck) = πk and general class-conditonal dendities p(ϕ|Ck) where ϕ is the input feature vector. Suppose we are given a training data set {ϕn, tn} where n = 1, …, N,
and tn is a binary target vector of length K that uses the 1-of-K coding scheme, so that
it has components tnj = Ijk if pattern n is from class Ck
. Assuming that the data
points are drwn independently from this model, show that the maximum-likelihood
solution for the prior probabilities is given by
πk =
Nk
N
,
where Nk
is the number of data points assigned to class Ck
.
1
Machine Learning (CS405) – Homework #4 2
Question 4
Verify the relation
dσ
da
= σ(1 − σ)
for the derivative of the logistic sigmoid function defined by
σ(a) = 1
1 + exp(−a)
Question 5
By making use of the result
dσ
da
= σ(1 − σ)
for the derivative of the logistic sigmoid, show that the derivative of the error function for the logistic regression model is given by
∇E(w) =
N
∑
n=1
(yn − tn)ϕn.
Hint. The error function for thr logistic regression model is given by
E(w) = −lnp(t|w) = −
N
∑
n=1
{tnlnyn + (1 − tn)ln(1 − yn)}.
Question 6
There are several possible ways in which to generalize the concept of linear discriminant functions from two classes to c classes. One possibility would be to use
(c − 1) linear discriminant functions, such that yk(x) > 0 for inputs x in class Ck and
yk(x) < 0 for inputs not in class Ck
. By drawing a simple example in two dimensions for c = 3, show that this approach can lead to regions of x-space for which
the classification is ambiguous. Another approach would be to use one discriminant
function yjk(x) for each possible pair of classes Cj and Ck
, such that yjk(x) > 0 for
patterns in class Cj and yjk(x) < 0 for patterns in class Ck
. For c classes, we would
need c(c − 1)/2 discriminant functions. Again, by drawing a specific example in two
dimensions for c = 3, show that this approach can also lead to ambiguous regions.
Machine Learning (CS405) – Homework #4 3
Question 7
Given a set of data points {x
n} we can define the convex hull to be the set of points x
given by
x = ∑n
αnx
n
where αn >= 0 and ∑n αn = 1. Consider a second set of points {z
m} and its corresponding convex hull. The two sets of points will be linearly separable if there exists
a vector wˆ and a scalar w0 such that wˆ
Tx
n + w0 > 0 for all x
n
, and wˆ
Tz
m + w0 < 0
for all z
m. Show that, if their convex hulls intersect, the two sets of points cannot be
linearly separable, and conversely that, if they are linearly separable, their convex
hulls do not intersect.

Homework #4 Course: Machine Learning (CS405) solved

Download Details:

Description

Homework #4 Course: Machine Learning (CS405) solved

Download Details:

Description

Related products

Solved ComS 573 Lab 3 Decision Tree Learning

COMS 4771 HW2 solved

EECE 5644 Introduction to Machine Learning and Pattern Recognition Problem Set 1 solution