Description
Problem 1: ExpectationMaximization Algorithm for Clustering [30
pt.]
Implement expectationmaximization algorithm for Gaussian mixture models (see the EM algorithm below) in Python and call this program Gk. As you present your code explain your protocol
for
1. initializing each Gaussian
2. deciding ties
3. stopping criteria
Problem 2: Analysis of the EM over Realworld Data Sets [20 pt.]
Run your EM program, Gk, against the Ringnorm and Ionosphere data sets. Discuss your
results.
• Ringnorm Data Set
• Ionosphere Data Set
Run Gk with k = 2, . . . , 5 (20 runs each for each k). Report error rates and iteration counts
for each k using whisker plots. An example of whisker can be found in the appendix.
A simple error rate can be calculated as follows:
• After Gk converges, combine the clusters to ended up with two clusters for any k as follows:
Since the true clusters are known for a given arbitrary blocks number, final clusters are
determined by measuring the Euclidean (this is the easiest choice) distances between true
cluster centers and predicted cluster centers.
In other words, you will always calculate the error for k = 2 since there are only 2 clusters in
the given data sets. Below is an example of error calculation for Ionosphere data set. You can
similarly calculate an error rate for Ringnorm data set. After Gk converges, partition the data
points to kclusters, C1, . . . , Ck using likelihoods (hard assignment). A data point in a cluster is
classified as good if P(gi) > P(bi) and bad otherwise.
Assignment № 6 Page 2
For each cluster Ci
form two counts (over Ionosphere Data Set):
gi ←
X
δj∈Ci
[δj = g], good
bi ←
X
δj∈ci.B
[δj == b], bad
where [x = y] returns 1 if True, 0 otherwise. For example, [2 = 3] + [0 = 0] + [34 = 34] = 2
We can now calculate a simple error rate. Assume Gi
is good. Then the error is:
error(Ci) = bi
bi + gi
We can find the total error rate easily:
Error({C1, C2}) = X
2
i=1
error(Ci)
Problem 3: Algorithm Design [30pt]
Problem 3.1
Given a text D and a pattern P, describe an Ω(d+p) time method for finding the longest prefix
of P that is a substring of D. The lengths of D and P are d and p, respectively.
Problem 3.2
X, Y, and Z are three arrays and each has m elements. For an arbitrary integer t, describe
O(m2
logm)time algorithm to determine if there exist numbers, x in X, y in Y, and z in Z, such
that t = x+y+z.
Problem 3.3
Describe an efficient algorithm for deleting a string from a compressed trie and analyze its running time.
Directions
Please follow the syllabus guidelines in turning in your homework. While testing your programs,
run them with a variety of inputs and discuss your findings. This homework is due Sunday, Nov
23, 2021 10:00pm. OBSERVE THE TIME. Absolutely no homework will be accepted after that
time. All the work should be your own.
Assignment № 6 Page 3
Appendix
The EM algorithm
This part is provided to help you implement the EM algorithm.
Let D = {xj
 j = 1, . . . , n} be the data set where each xj ∈ ℜd
(ℜ: Reals) and D is a mixture of a Gaussian. Given D, the number of blocks k, and convergence threshold ϵ, the EMT
algorithm partition data into k clusters, C = {C1, C2, . . . , Ck}, where each Ci ∈ C can be characterized as a Gaussian distribution. If each cluster Ci ∈ C can be represented by a multivariate
normal distribution (MVN):
f(xj
µi
, Σi) = 1
(2π)
d/2Σi

1/2
∗ expn
−
1
2
(xj − µi)
T Σ
−1
i
(xj − µi)
o
where µi ∈ ℜd and Σi ∈ ℜd×d denote cluster mean and covariance matrix for cluster Ci
,
respectively, and they are unknown parameters. f(xj
µi
, Σi) represents the probability density
at xj
for cluster Ci
. Lastly, D is a mixture of C1, C2, . . . , Ck.
The algorithm iteratively alternates between (1) computing loglikelihood of each data point
being from each Gaussian (Estep) (2) recalculating the parameters (Mstep). Iteration continues until a set of means is stable.
• Initialization:
– µi
is randomly selected from D for each cluster.
– Σi ← I. For each cluster, the covariance matrix is a d × d identity matrix.
– P(Ci) = 1
k
. The priors are uniformly initialized.
Assignment № 6 Page 4
Figure 1: An example of whisker plot
Assignment № 6 Page 5