Description
1. In this part, we will apply the PCA technique for the task of image denoising. Consider the images barbara256.png
and stream.png present in the corresponding data/ subfolder – this image has gray-levels in the range from
0 to 255.
For the latter image, you should extract the top-left portion of size 256 by 256. Add zero mean
Gaussian noise of σ = 20 to one of these images using the MATLAB code im1 = im + randn(size(im))*20.
Note that this noise is image-independent.
(If during the course of your implementation, your program takes
too long, you can instead work with the file barbara256-part.png which has size 128 by 128 instead of 256 by
256.
You can likewise extract the top-left 128 by 128 part of the stream.png image. You will not be penalized
for working on these image parts.)
(a) In the first part, you will divide the entire noisy image ‘im1’ into overlapping patches of size 7 by 7, and
create a matrix P of size 49 × N where N is the total number of image patches. Each column of P
is a single patch reshaped to form a vector. Compute eigenvectors of the matrix PPT
, and the eigencoefficients of each noisy patch.
Let us denote the j
th eigen-coefficient of the i
th (noisy) patch (i.e. Pi)
by αij . Define ¯α
2
j = max(0,
1
N
[
PN
i=1 α
2
ij ] − σ
2
), which is basically an estimate of the average squared
eigen-coefficients of the ‘original (clean) patches’.
Now, your task is to manipulate the noisy coefficients
{αij} using the following rule, which is along the lines of the Wiener filter update that we studied in class:
α
denoised
ij =
αij
1 + σ2
α¯
2
j
.
Here, α
denoised
ij stands for the j
th eigencoefficient of the i
th denoised patch. Note that
σ
2
α¯
2
j
is an estimate of the ISNR, which we absolutely need for any practical implementation of a Wiener
filter update.
After updating the coefficients by the Wiener filter rule, you should reconstruct the denoised
patches and re-assemble them to produce the final denoised image which we will call ‘im2’. Since you chose
overlapping patches, there will be multiple values that appear at any pixel.
You take care of this situation
using simple averaging. Write a function myPCADenoising1.m to implement this. Display the final image
‘im2’ in your report and state its RMSE computed as ∥im2denoised − im2orig∥2
∥im2orig∥2
.
(b) In the second part, you will modify this technique. Given any patch Pi
in the noisy image, you should
collect K = 200 most similar patches (in a mean-squared error sense) from within a 31 × 31 neighborhood
centered at the top left corner of Pi
.
We will call this set of similar patches as Qi (this set will of course
include Pi). Build an eigen-space given Qi and denoise the eigen-coefficients corresponding to only Pi
using the Wiener update mentioned earlier. The only change will be that ¯α
j will now be defined using only
the patches from Qi (as opposed to patches from all over the image).
Reconstruct the denoised version of
Pi
. Repeat this for every patch from the noisy image (i.e. create a fresh eigen-space each time). At any
pixel, there will be multiple values due to overlapping patches – simply average them. Write a function
myPCADenoising2.m to implement this. Reconstruct the final denoised image, display it in your report
and state the RMSE value. Do so for both barbara as well as stream.
(c) Now run your bilateral filter code from Homework 2 on the noisy version of the barbara image. Compare
the denoised result with the result of the previous two steps for both images. What differences do you
observe? What are the differences between this PCA based approach and the bilateral filter?
(d) Consider that a student clamps the values in the noisy image‘im1’ to the [0,255] range, and then denoises
it using the aforementioned PCA-based filtering technique which assumes Gaussian noise. Is this approach
correct? Why (not)? [10 + 20 + 5 + 5 = 40 points]
2. Read Section 1 of the paper ‘An FFT-Based Technique for Translation, Rotation, and Scale-Invariant Image
Registration’ published in the IEEE Transactions on Image Processing in August 1996. A copy of this paper is
available in the homework folder.
(a) Describe the procedure in the paper to determine translation between two given images. What is the time
complexity of this procedure to predict translation if the images were of size N × N? How does it compare
with the time complexity of pixel-wise image comparison procedure for predicting the translation?
(b) Also, briefly explain the approach for correcting for rotation between two images, as proposed in this paper
in Section II. Write down an equation or two to illustrate your point.
[10+10=20 points]
3. Consider a matrix A of size m × n, m ≤ n. Define P = AT A and Q = AAT
. (Note: all matrices, vectors and
scalars involved in this question are real-valued).
(a) Prove that for any vector y with appropriate number of elements, we have y
tP y ≥ 0. Similarly show
that z
tQz ≥ 0 for a vector z with appropriate number of elements. Why are the eigenvalues of P and Q
non-negative?
(b) If u is an eigenvector of P with eigenvalue λ, show that Au is an eigenvector of Q with eigenvalue λ. If v
is an eigenvector of Q with eigenvalue µ, show that AT v is an eigenvector of P with eigenvalue µ. What
will be the number of elements in u and v?
(c) If vi
is an eigenvector of Q and we define ui ≜
AT vi
∥AT vi∥2
. Then prove that there will exist some real,
non-negative γi such that Aui = γivi
.
(d) It can be shown that u
T
i uj = 0 for i ̸= j and likewise v
T
i vj = 0 for i ̸= j for correspondingly distinct
eigenvalues. (You did this in HW4 where you showed that the eigenvectors of symmetric matrices are
orthonormal.)
Now, define U = [v1|v2|v3|…|vm] and V = [u1|u2|u3|…|um]. Now show that A = UΓV
T
where Γ is a diagonal matrix containing the non-negative values γ1, γ2, …, γm. With this, you have just
established the existence of the singular value decomposition of any matrix A.
This is a key result in linear
algebra and it is widely used in image processing, computer vision, computer graphics, statistics, machine
learning, numerical analysis, natural language processing and data mining. [7.5 + 7.5 + 7.5 + 7.5 = 30
points]
4. Suppose you are standing in a well-illuminated room with a large window, and you take a picture of the scene
outside.
The window undesirably acts as a semi-reflecting surface, and hence the picture will contain a reflection
of the scene inside the room, besides the scene outside. While solutions exist for separating the two components
from a single picture, here you will look at a simpler-to-solve version of this problem where you would take two
pictures.
The first picture g1 is taken by adjusting your camera lens so that the scene outside (f1) is in focus
(we will assume that the scene outside has negligible depth variation when compared to the distance from the
camera, and so it makes sense to say that the entire scene outside is in focus), and the reflection off the window
surface (f2) will now be defocussed or blurred.
This can be written as g1 = f1 + h2 ∗ f2 where h2 stands for
the blur kernel that acted on f2. The second picture g2 is taken by focusing the camera onto the surface of the
window, with the scene outside being defocussed.
This can be written as g2 = h1 ∗ f1 + f2 where h1 is the blur
kernel acting on f1. Given g1 and g2, and assuming h1 and h2 are known, your task is to derive a formula to
determine f1 and f2.
Note that we are making the simplifying assumption that there was no relative motion
between the camera and the scene outside while the two pictures were being acquired, and that there were no
changes whatsoever to the scene outside or inside. Even with all these assumptions, you will notice something
inherently problematic about the formula you will derive. What is it? [5+5 = 10 points]