Problem 1 Hidden Markov Models
In this problem, you are given parameters of a small hidden Markov model and you will implement three
inference procedures discussed in the class. In hmm model.json, you will find the following model parameters (note the slight notation difference from the class):
• π: the initial probabilities, πi = P(Z1 = si);
• A: the transition probabilities, with aij = P(Zt = sj
|Zt−1 = si);
• B: the observation probabilities, with bik = P(Xt = ok
|Zt = si).
Now we observe a sequence O of length N. Your task is to write code to compute probabilities and infer
about the possible hidden states given this observation. In particular, in 1.1 you need to implement the
forward and backward procedures; in 1.2, you need to compute P(X1:N = O), the probability of observing
the sequence, using the forward or backward messages; finally in 1.3, you need to infer the most likely
state path using the Viterbi algorithm.
Note that your code might be tested against different parameter
sets/observation sequences when graded.
1.1 Please finish the implementation of the functions forward() and backward() in hmm.py:
• forward(π, A, B,O) takes the model parameters π, A, B and the observation sequence O as input
and output a numpy array α, where α[j, t − 1] = P(Zt = sj
, X1:t = x1:t). Note that this is α[j, t − 1]
instead of α[j, t] just because the index of an array starts from 0.
• backward(π, A, B,O) takes the model parameters π, A, B and the observation sequence O as input
and output a numpy array β, where β[j, t − 1] = P(Xt+1:N = xt+1:N | Zt = sj). Note the same index
difference from the lecture slides.
1.2 Now we can calculate P(X1:N = O) from the output of your forward and backward algorithms. Please
finish the implementation of the function seqprob forward() and seqprob backward() in hmm.py.
Both of them should return P(X1:N = O). Note that in seqprob forward() the input is only the forward
messages, while in seqprob backward() you have access to the model parameters too. Please refer to
Slide 29/60 of Lec 10 for the concrete formulas.
1.3 Next you need to find the most likely state path based on the observation O, namely,
P(Z1:N = z
1:N | X1:N = O).
Please implement the Viterbi algorithm viterbi() in hmm.py. The function viterbi(π, A, B,O) takes
the model parameters π, A, B and the observation sequence O as inputs and outputs a list path which contains the most likely state path z
(in terms of the state index) you have found.
What to do and submit: After you finish each task in 1.1, 1.2 and 1.3 in hmm.py, run the script hmm.sh.
It will generate hmm.txt. Add, commit, and push both hmm.py and hmm.txt before the due date.
Problem 2 Principal Component Analysis
In this question we will implement Principal Component Analysis (PCA) and use it for image compression.
We will use black and white images in this example for illustratory purposes. (However, PCA can also be
used for compression of color images too. For that you need to perform compression on each channel
separately and then combine them.)
Figure 1: Eigenvectors together with their eigenvalues
Figure 2: Mean image
Let X be an N × D data matrix, where N is the number of images and D is the number of pixels of
each image. For a given parameter K < D, you need to use PCA to compress X into an N × K matrix. As
discussed in the class, this is done via first centering the data, then finding the top K eigenvectors of the
covariance matrix X
TX, and finally projecting the original data onto these eigenvectors.
Since each of these eigenvectors has dimensionality D, it can also be represented as an image of the same
size as the dataset. For example, in Figure 1 we demonstrate top eigenvectors corresponding to all images
with digit 6 in MNIST together with their eigenvalues. Figure 2 demonstrates the mean value of MNIST
images that correspond to digit 6.
Figure 3: Original image (on the right) and decompressed versions with different compression rates.
Please implement PCA in function pca of pca.py file, which returns the compressed representation Y ∈
RN×K and a matrix M ∈ RD×K that consists of the top K eigenvectors, represented as column vectors.
that we have subtracted the mean values from the data before passing it to pca, so you do not need to do
With the eigenvector matrix M, we can approximately reconstruct the original dataset as Xˆ = YMT
. Implement this step in function decompress of pca.py.
In Figure 3 we show several decompressed images with different compression rates (K) and the original
image. It is easy to see that larger K leads to better reconstruction. To quantify the reconstruction error
between the original image x and the reconstructed one ˆx, we calculate the pixel-wise mean-square-error
kx − xˆk
. Implement this step in function reconstruction error of pca.py.
What to do and submit: After implementation, run python3 pca.py to generate pca output.txt. Submit
both pca.py and pca output.txt.