Description
1 Clustering. [100 points total. Each part is 25 points.]
[a-b] Given N data points xn(n = 1, . . . , N), K-means clustering algorithm groups them into K clusters by
minimizing the distortion function over {r
nk, µk}
J =
X
N
n=1
X
K
k=1
r
nkkx
n − µ
k
k
2
,
where r
nk = 1 if xn belongs to the k-th cluster and r
nk = 0 otherwise.
(a) Prove that using the squared Euclidean distance kx
n − µ
kk
2 as the dissimilarity function
and minimizing the distortion function, we will have
µ
k =
P
n
r
nkx
n
P
n
r
nk .
That is, µ
k
is the center of k-th cluster. [5 pts]
(b) Prove that K-means algorithm converges to a local optimum in finite steps. [5 pts]
[c-d] In class, we discussed bottom-up hierarchical clustering. For each iteration, we need to find two clusters
{x1, x2, . . . , xm} and {y1
, y2
, . . . , yp} with the minimum distance to merge. Some of the most commonly used
distance metrics between two clusters are:
• Single linkage: the minimum distance between any pairs of points from the two clusters, i.e.
min
i=1,…,m
j=1,…,p
kxi − yjk
• Complete linkage: the maximum distance between any parts of points from the two clusters, i.e.
max
i=1,…,m
j=1,…,p
kxi − yjk
• Average linkage: the average distance between all pair of points from the two clusters, i.e.
1
mp
Xm
i=1
Xp
j=1
kxi − yjk
1
(c) When we use the bottom up hierarchical clustering to realize the partition of data, which
of the three cluster distance metrics described above would most likely result in clusters most
similar to those given by K-means? (Suppose K is a power of 2 in this case). [5 pts]
(d) For the following data (two moons), which of these three distance metrics (if any) would
successfully separate the two moons? [5 pts]
−1.5 −1 −0.5 0 0.5 1 1.5 2 2.5
−2.5
−2
−1.5
−1
−0.5
0
0.5
1
1.5
2


