Description
Question 1
Show that maximization of the class separation criterion given by m2 − m1 =
wT(m2 − m1) with respect to w, using a Lagrange multiplier to enforce the constraint
wTw = 1, leads to the result that w ∝ (m2 − m1).
Question 2
Show that the Fisher criterion
J(w) = (m2 − m1)
2
s
2
1 + s
2
2
can be written in the form
J(w) = wTSBw
wTSWw
.
Hint.
y = wT
x, mk = wTmk, s
2
k = ∑
n∈Ck
(yn − mk)
2
Question 3
Consider a generative classification model for K classes defined by prior class probabilities p(Ck) = πk and general class-conditonal dendities p(ϕ|Ck) where ϕ is the input feature vector. Suppose we are given a training data set {ϕn, tn} where n = 1, …, N,
and tn is a binary target vector of length K that uses the 1-of-K coding scheme, so that
it has components tnj = Ijk if pattern n is from class Ck
. Assuming that the data
points are drwn independently from this model, show that the maximum-likelihood
solution for the prior probabilities is given by
πk =
Nk
N
,
where Nk
is the number of data points assigned to class Ck
.
1
Machine Learning (CS405) – Homework #4 2
Question 4
Verify the relation
dσ
da
= σ(1 − σ)
for the derivative of the logistic sigmoid function defined by
σ(a) = 1
1 + exp(−a)
Question 5
By making use of the result
dσ
da
= σ(1 − σ)
for the derivative of the logistic sigmoid, show that the derivative of the error function for the logistic regression model is given by
∇E(w) =
N
∑
n=1
(yn − tn)ϕn.
Hint. The error function for thr logistic regression model is given by
E(w) = −lnp(t|w) = −
N
∑
n=1
{tnlnyn + (1 − tn)ln(1 − yn)}.
Question 6
There are several possible ways in which to generalize the concept of linear discriminant functions from two classes to c classes. One possibility would be to use
(c − 1) linear discriminant functions, such that yk(x) > 0 for inputs x in class Ck and
yk(x) < 0 for inputs not in class Ck
. By drawing a simple example in two dimensions for c = 3, show that this approach can lead to regions of x-space for which
the classification is ambiguous. Another approach would be to use one discriminant
function yjk(x) for each possible pair of classes Cj and Ck
, such that yjk(x) > 0 for
patterns in class Cj and yjk(x) < 0 for patterns in class Ck
. For c classes, we would
need c(c − 1)/2 discriminant functions. Again, by drawing a specific example in two
dimensions for c = 3, show that this approach can also lead to ambiguous regions.
Machine Learning (CS405) – Homework #4 3
Question 7
Given a set of data points {x
n} we can define the convex hull to be the set of points x
given by
x = ∑n
αnx
n
where αn >= 0 and ∑n αn = 1. Consider a second set of points {z
m} and its corresponding convex hull. The two sets of points will be linearly separable if there exists
a vector wˆ and a scalar w0 such that wˆ
Tx
n + w0 > 0 for all x
n
, and wˆ
Tz
m + w0 < 0
for all z
m. Show that, if their convex hulls intersect, the two sets of points cannot be
linearly separable, and conversely that, if they are linearly separable, their convex
hulls do not intersect.