Description
1. Consider a signal x which is sparse in the canonical basis and contains n elements, which is compressively
sensed in the form y = Φx+η where y, the measurement vector, has m elements and Φ is the m×n sensing
matrix. Here η is a vector of noise values that are distributed by N (0, σ2
). One way to recover x from y, Φ
is to solve the LASSO problem, based on minimizing J(x) , ky − Φxk
2 + λkxk1. A crucial issue is to how
to choose λ. One purely data-driven technique is called cross-validation. In this technique, out of the m
measurements, a random subset of (say) 90 percent of the measurements is called the reconstruction set R,
and the remaining measurements constitute the validation set V . Thus V and R are always disjoint sets.
The signal x is reconstructed using measurements only from R (and thus only the corresponding rows of Φ)
using one out of many different values of λ chosen from a set Λ. Let the estimate using the g
th value from
Λ be denoted xg. The corresponding validation error is computed using V E(g) ,
P
i∈V
(yi − Φixg)
2/|V |.
The value of λ for which the validation error is the least is chosen to be the optimal value of λ. Your job
is to implement this technique for the case when n = 500, m = 200, kxk0 = 18, σ = 0.05 ×
Pm
i=1 |Φix|/m.
Choose Λ = {0.0001, 0.0005, 0.001, 0.005, 0.01, 0.05, 0.1, 0.5, 1, 2, 5, 10, 15, 20, 30, 50, 100}. Draw the non-zero
elements of x at randomly chosen location, and let their values be drawn randomly from Uniform(0, 1000).
The sensing matrix Φ should be drawn from ±1/
√
m Bernoulli with probability of +1/
√
m being 0.5. Now
do as follows. Use the L1-LS solver from https://web.stanford.edu/~boyd/l1_ls/ for implementing the
LASSO.
(a) Plot a graph of V E versus the logarithm of the values in Λ. Also plot a graph of the RMSE versus the
logarithm of the values in Λ, where RMSE is given by kxg − xk2/kxk2. Comment on the plots. Do the
optimal values of λ from the two plots agree?
(b) What would happen if V and R were not disjoint but coincident sets?
(c) The validation error is actually a proxy for actual mean squared error. Note that you can never
determine the mean squared error since the ground truth x is unknown in an actual application.
Which theorem/lemma from the paper https://ieeexplore.ieee.org/document/6854225 (On the
theoretical analysis of cross-validation in compressed sensing) refers to this proxying ability? Explain
how.
(d) In your previous assignment, there was a theorem from the book by Tibshirani and others which gave
you a certain value of λ. What is the advantage of this cross-validation method compared to the choice
of λ using that theorem? Explain. [10+5+5+5=25 points]
1
2. Consider that you learned a dictionary D to sparsely represent a certain class S of images – say handwritten
alphabet or digit images. How will you convert D to another dictionary which will sparsely represent the
following classes of images? Note that you are not allowed to learn the dictionary all over again, as it is
time-consuming.
(a) Class S1 which consists of images obtained by applying a known derivative filter to the images in S.
(b) Class S2 which consists of images obtained by rotating a subset of the images in class S by a known
fixed angle α, and the other subset by another known fixed angle β.
(c) Class S3 which consists of images obtained by applying an intensity transformation I
i
new(x, y) =
α(I
i
old(x, y))2 + β(I
i
old(x, y)) + γ to the images in S, where α, β, γ are known.
(d) Class S4 which consists of images obtained by applying a known blur kernel to the images in S.
(e) Class S5 which consists of images obtained by applying a blur kernel which is known to be a linear
combination of blur kernels belonging to a known set B, to the images in S.
(f) Class S6 which consists of 1D signals obtained by applying a Radon transform in a known angle θ to
the images in S.
(g) Class S7 which consists of images obtained by translating a subset of the images in class S by a known
fixed offset (x1, y1), and the other subset by another known fixed offset (x2, y2). Assume appropriate
zero-padding and increase in the size of the image canvas owing to the translation. [4+4+4+4+4+6+4=30
points]
3. How will you solve for the minimum of the following objective functions: (1) J(Ar) = kA − Ark
2
F
, where
A is a known m × n matrix of rank greater than r, and Ar is a rank-r matrix, where r < m, r < n.
(2) J(R) = kA − RBk
2
F
, where A ∈ R
n×m, B ∈ R
n×m, R ∈ R
n×n
, m > n and R is constrained to be
orthonormal. Note that A and B are both known.
In both cases, explain briefly any one situation in image processing where the solution to such an optimization
problem is required. [5+5+5+5=20 points]
4. We have studied the non-negative matrix factorization (NMF) technique in our course and examined applications in face recognition. I also described the application to hyperspectral unmixing. Your job is to find
a research paper which explores an application of NMF in any task apart from these. You may look up the
wikipedia article on this topic. Other interesting applications include stain normalization in pathology. Your
job is to answer the following: (1) Mention the title, author list, venue and year of publication of the paper
and include a link to it. (2) Which task does the paper apply NMF to? (3) How exactly does the paper
solve the problem using NMF? What is the significance of the dictionary and the dictionary coefficients in
solving the problem at hand? [15 points]
5. In parallel bean computed tomography, the projection measurements are represented as a single vector
y ∼ Poisson(Io exp(−Rf)), where y ∈ R
m with m = number of projection angles × number of bins per
angle; Io is the power of the incident X-Ray beam; R represents the Radon operator (effectively a m × n
matrix) that computes the projections at the pre-specified known projection angles; and f represents the
unknown signal (actually tissue density values) in R
n
. If m < n, write down a suitable objective function
whose minimum would be a good estimate of f given y and R and which accounts for the Poisson noise
in y. State the motivation for each term in the objective function. Recall that if z ∼ Poisson(λ), then
P(z = k) = λ
k
e
−λ/k! where k is a non-negative integer. Now suppose that apart from Poisson noise, there
was also iid additive Gaussian noise with mean 0 and known standard deviation σ, in y. How would you
solve this problem (eg: appropriate preprocessing or suitable change of objective function)? [6+ 4 = 10
points]
2

