Description
1 Kernels [20 points]
(a) Identify which of the followings is a valid kernel. If it is a kernel, please write your answer
explicitly as ‘True’ and give mathematical proofs. If it is not a kernel, please write your answer
explicitly as ‘False’ and give explanations. [8 pts]
Suppose K1 and K2 are valid kernels (symmetric and positive definite) defined on Rm × Rm.
1. K(u, v) = αK1(u, v) + βK2(u, v), α, β ∈ R.
2. K(u, v) = K1(f(u), f(v)) where f : Rm → Rm. coefficients.
1https://matlab.mathworks.com/
2You can also use the environment for developing by registering online with your university email.
1
3.
K(u, v) = (
1 if ku − vk2 6 1
0 otherwise
(1)
4. Suppose K0
is a valid kernel.
K(u, v) = K0
(u, v)
p
K0(u, u)K0(v, v)
. (2)
(b) Write down kernelized version of Fisher’s Linear Discriminant Analysis using kernel trick.
Please provide full steps and all details of the method. [Hint: Use kernel to replace inner
products.] [12 pts]
2 Markov Random Field, Conditional Random Field [20 pts]
[a-b] A probability distribution on 3 discrete variables a,b,c is defined by P(a, b, c) = 1
Z
ψ(a, b, c) = 1
Z
φ1(a, b)φ2(b, c),
where the table for the two factors are given below.
a b φ1(a, b)
0 0 4
0 1 3
1 0 3
1 1 1
b c φ2(b, c)
0 0 3
0 1 2
0 2 1
1 0 4
1 1 1
1 2 3
(a) Compute the slice of the joint factor ψ(a, b, c) corresponding to b = 1. This is the table
ψ(a, b = 1, c). [5 pts]
(b) Compute P(a = 1, b = 1). [5 pts]
(c) Explain the difference between Conditional Random Fields and Hidden Markov Models
with respect to the following factors. Please give only a one-line explanation. [10 pts]
• Type of model – generative/discriminative
• Objective function optimized
• Require a normalization constant
3 Hidden Markov Model [50 pts]
This problem will let you get familiar with HMM algorithms by doing the calculations by hand.
[a-c] There are three coins (1, 2, 3), to throw them randomly, and record the result. S = 1, 2, 3; V = H, T
(Head or Tail); A, B, π is given as
2
A:
1 2 3
1 0.9 0.05 0.05
2 0.45 0.1 0.45
3 0.45 0.45 0.1
B:
1 2 3
H 0.5 0.75 0.25
T 0.5 0.25 0.75
π : π 1/3 1/3 1/3
(a) Given the model above, what’s the probability of observation O = H, T, H. [10 pts]
(b) Describe how to get the A, B, and π, when they are unknown. [10 pts]
(c) In class, we studied discrete HMMs with discrete hidden states and observations. The
following problem considers a continuous density HMM, which has discrete hidden states but
continuous observations. Let St ∈ 1, 2, …, n denote the hidden state of the HMM at time t, and
let Xt ∈ R denote the real-valued scalar observation of the HMM at time t. In a continuous
density HMM, the emission probability must be parameterized since the random variable Xt
is no longer discrete. It is defined as P(Xt = x|St = i) = N (µi
, σ2
i
). Given m sequences of
observations (each of length T), derive the EM algorithm for HMM with Gaussian observation
model. [14 pts]
(d) For each of the following sentences, say whether it is true or false and provide a short
explanation (one sentence or so). [16 pts]
• The weights of all incoming edges to a state of an HMM must sum to 1.
• An edge from state s to state t in an HMM denotes the conditional probability of going to state s given
that we are currently at state t.
• The ”Markov” property of an HMM implies that we cannot use an HMM to model a process that
depends on several time-steps in the past.
• The Baum-Welch algorithm is a type of an Expectation Maximization algorithm and as such it is
guaranteed to converge to the (globally) optimal solution.
4 Programming [30 pts]
In this problem, you will implement algorithm to analyze the behavior of SP500 index over a period of time.
For each week, we measure the price movement relative to the previous week and denote it using a binary
variable (+1 indicates up and 1 indicates down). The price movements from week 1 (the week of January
5) to week 39 (the week of September 28) are plotted below.
Consider a Hidden Markov Model in which xt denotes the economic state (good or bad) of week t
and yt denotes the price movement (up or down) of the SP500 index. We assume that x(t+1) = xt with
probability 0.8, and P(Yt|Xt)(yt = +1|xt = good) = P(Yt|Xt)(yt = −1|xt = bad) = q. In addition, assume that
P(X1)(x1 = bad) = 0.8. Load the sp500.mat, implement the algorithm, briefly describe how you implement
this and report the following :
(a) Assuming q = 0.7, plot P(Xt|Y )(xt = good|y) for t = 1, 2, …, 39. What is the probability that
the economy is in a good state in the week of week 39. [15 pts]
(b) Repeat (a) for q = 0.9, and compare the result to that of (a). Explain your comparison in
one or two sentences. [15 pts]
3