# Homework Set 2 CPSC 8420 solution

\$30.00

Original Work ?

## Description

5/5 - (1 vote)

Problem 1
For PCA, from the perspective of maximizing variance, please show that the solution of φ to
maximize kXφk
2
2
, s.t. kφk2 = 1 is exactly the first column of U, where [U, S] = svd(XT X).

(Note:
you need prove why it is optimal than any other reasonable combinations of Ui
, say φˆ = 0.8 ∗ U(:
, 1) + 0.6 ∗ U(:, 2) which also satisfies kφˆk2 = 1.)

Problem 2
Why might we prefer to minimize the sum of absolute residuals instead of the residual sum of
squares for some data sets? Recall clustering method K-means when calculating the centroid, it is
to take the mean value of the data-points belonging to the same cluster, so what about K-medians?
What is its advantage over of K-means? Please use a synthetic (toy) experiment to illustrate your
conclusion.

Problem 3
Let’s revisit Least Squares Problem: minimize
β
1
2
ky − Aβk
2
2
, where A ∈ R
n×p
.

1. Please show that if p > n, then vanilla solution (AT A)
−1AT y is not applicable any more.

2. Let’s assume A = [1, 2, 4; 1, 3, 5; 1, 7, 7; 1, 8, 9], y = [1; 2; 3; 4]. Please show via experiment results that Gradient Descent method will obtain the optimal solution with Linear Convergence
rate if the learning rate is fixed to be 1
σmax(AT A)
, and β0 = [0; 0; 0].

3. Now let’s consider ridge regression: minimize
β
1
2
ky − Aβk
2
2 +
λ
2
kβk
2
2
, where A, y, β0 remains the same as above while learning rate is fixed to be 1
λ+σmax(AT A)
where λ varies from
0.1, 1, 10, 100, 200, please show that Gradient Descent method with larger λ converges faster.

Problem 4