EE6132 Programming Assignment 1 to 4 solutions

$75.00

Original Work ?
Category: You will Instantly receive a download link for .ZIP solution file upon Payment

Description

5/5 - (1 vote)

EE6132 Programming Assignment-1: Canny Edge Detection

1. You are required to implement the canny edge detection algorithm as described in class. Please
follow the steps mentioned below.
(a) Read the image ‘clown.jpeg’ and display it.
(b) Convert it into a grayscale image.
(c) To suppress the noise use a Gaussian kernel to smoothen it. Keep the kernel size as 5 × 5
and sigma = 1.5.
(d) Apply the standard Sobel operator Gx and Gy discussed in class. Display the filtered
outputs. Also show the gradient magnitude and angle images.
(e) Apply non-maximum suppression as discussed in class. The exact method requires computing the imaginary pixels shown in gray in the image below using interpolation methods
such as bilinear and bicubic interpolation. As a simplification just locate the image pixels
1
Figure 1: The exact method requires you to do interpolation. As an alternative adopt the simplified
approach described in Step (d).
which are spatially closest to the imaginary gray pixels and use those pixels as the neighboring pixels for non-maxima suppression. This way you can bypass the interpolation
step and have a computationally lighter method if you so desire. Show the Non-Maximum
Suppression output.
(f) Instead of using the Double thresholding Hysteresis used in the exact algorithm simply
do a single thresholding operation. Pixels with values less than the threshold should be
suppressed. Use the median value of the magnitude image computed in Step 4 as the
threshold value. Show the final output.
2. Repeat the above question but increase the sigma of the Gaussian kernel to 3. Choose your
filter size appropriately. Just show the final output for this question. How does the final output
differ from that of the previous question?
Page 2

EE6132 Programming Assignment-2 Filtering and Hybrid Images

1 Filtering
In signal processing, €ltering is a process of removing unwanted features or components from
a signal. Most o‰en, this means removing or suppressing some frequencies or frequency bands.
Filtering operation can be linear, non-linear, space-variant or invariant.
Below, we will look at some operations that can be done on a signal. For the below operations,
consider a discrete time, 1D signal X of length 16, where X = {x0, x1, x2, . . . x15}. ‘e output of
the €ltering process is a signal Y = {y0, y1, y2, . . . y15}, the same length as X. Implement the following €ltering operations on the signal X de€ned as xk = 3 + sin 2πk/15 for k = {0, 1, . . . , 15}
and 0 otherwise.
For the €lters that are linear and space-invariant, verify that the convolutional implementation
of the €lter gives the same output as direct implementation.
a) yk = xk+1 − xk
b) yk = xk − X¯, where X¯ =
1
L + 1
X
L
i=0
xi
c) yk = median({xl
: l ∈ [k − 2, k + 2]})
d) yk = xk+0.5 − xk−0.5
e) yk = |xk+0.5 − xk−0.5|
f) yk =
1
5
X
k+2
i=k−2
xi
(Note: Linearly interpolate the neighboring samples to obtain signal values such as xk+0.5.)
Tasks
1. Implement each of the €ltering operations to obtain the desired output. Each of the outputs
have to be the same size as the input signal.
2. For each operation, determine if these operations are linear and space-invariant.
3. ‘ose operations that are linear and space-invariant, propose an equivalent convolution
operation to implement the €ltering process and also implement it.
4. For €lters that are implemented via convolution, verify if the results are the same visually.
2
2 Filtering in Fourier space
Filtering operations that are linear and space-invariant can be represented as a convolutional operation. Once, such a representation is obtained, we can use the Fourier property of convolutions
to eciently implement the €lter in the Fourier domain.
Tasks
1. For those €lters above that are linear and space-invariant, implement them in the Fourier
domain.
2. Verify that the desired output from the Fourier implementation is the same as the spatial
domain implementation. If there’s any di‚erence, explain why.
3. If for any of the cases, the output from the spatial and Fourier domain implementations are
di‚erent, then suggest the modi€cation to make the outputs same. Implement the modi€-
cation and re-verify if the results are the same.
3 Hybrid Images
We will write an image convolution function (image €ltering) and use it to create hybrid images!1
‘e technique was invented by Oliva, Torralba, and Schyns in 2006, and published in a paper at
SIGGRAPH. High frequency image content tends to dominate perception but, at a distance, only
low frequency (smooth) content is perceived. By blending high and low frequency content, we
can create a hybrid image that is perceived di‚erently at di‚erent distances.
A hybrid image is the sum of a low-pass €ltered version of a €rst image and a high-pass
€ltered version of a second image. We must tune a free parameter for each image pair to control
how much high frequency to remove from the €rst image and how much low frequency to leave
in the second image. ‘is is called the cut-o‚ frequency. ‘e paper suggests to use two cut-o‚
frequencies, one tuned for each image, and you are free to try this too. Using a single cut-o‚
frequency for both images should be sucient. We will use a symmetric, zero-mean gaussian
€lter for our €ltering (low-pass and high-pass) operations. In our case, the cut-o‚ frequency will
represent the standard deviation of the gaussian €lter that will be used.
1
Parts of this assignment are borrowed from this page
3
Process to generate a hybrid image:
a. Implement a my €lter.py function to implement the 2-D image €ltering operation. Your
function should:
– pad the input image with zeros before €ltering
– accept any arbitrary €lter kernel with which to convolve the image. If the €lter has
even dimension, then raise an exception.
– Return a €ltered image of the same spatial resolution as the input
– Supports €ltering of both grayscale and colour images.
b. Implement a function to generate a Gaussian kernel of a given standard deviation. ‘e
standard deviation represents the cut-o‚ frequency of the €lter.
c. Remove high frequencies from image1 by blurring image1 with the Gaussian €lter
d. Remove low frequencies from image2 using a two-step process. First, remove high frequencies from image2 using the process described in (c). ‘en subtract the low-pass €ltered
image2 from the original image to leave only the high frequency components.
e. Each of the €ltered images can have image values that are smaller than 0.0 or larger than
1.0. In such cases the clip the values smaller than 0.0 to 0.0 and values larger than 1.0 to 1.0.
For high pass €ltered image add a contant value of 0.5 to the whole image before clipping
the values to be between 0. and 1..
f. Combine the two images to generate the hybrid image.
Once the hybrid images are successfully created, you can view it from di‚erent distances
to perceive di‚erent images. An useful way to visualize this hybrid image is by progressively
downsampling the image. By progressive downsampling, we remove a part of the frequency
content of the signal. Which part of the frequency content is removed and why? How does this
a‚ect the visualization of the hybrid images at di‚erent resolutions?
Tasks
1. ‘ere are 7 di‚erent pairs of images in the data directory provided. For each pair, you
have to generate 2 di‚erent hybrid images by considering the €rst image in the directory as
4
image1 and image2 respectively. ‘ese pairs of images can be color, or grayscale and also
of di‚erent resolutions. Your code should be able to handle all the di‚erent cases.
2. Use the function provided in the helpers.py €le to visualize all the hybrid images at
di‚erent resolutions.
3. Don’t forget to tune the cut-o‚ frequency for each pair of hybrid images to get the most
visually pleasing results.
5

EE6132 Programming Assignment – 3 Panoramic Stitching

1 Panoramic Stitching
In this problem we will develop an algorithm for stitching a panorama from overlapping photos (Figure
1), which amounts to estimating a transformation that aligns one image to another. To do this, we will
compute SURF/SIFT/ORB features in both images and match them to obtain correspondences. We will
then estimate a homography from these correspondences, and we’ll use it to stitch the two images together
in a common coordinate system. In order to get an accurate transformation, we will need many accurate
feature matches. Unfortunately, feature matching is a noisy process: even if two image patches (and their
SURF/SIFT/ORB descriptors) look alike, they may not be an actual match. To make our algorithm robust
to matching errors, we will use RANSAC, a method for estimating a parametric model from noisy observations. We will use the obtained homography transformation to do panoramic stitching. We have provided
you with two input images and also starter code.
1
(a) Image pair (b) Stitched panorama
Figure 1: Panorama produced using our implementation
Tasks
1. Implement get features(img) to compute SURF/SIFT/ORB features for both of the given image.
Implement match keypoints(desc1, desc2) to compute key-point correspondences between the two
source images using the ratio test. Run the plotting code to visualize the detected features and resulting
correspondences. (Hint: You can use existing libraries.)
2. Write a function find homography(pts1, pts2) that takes in two N×2 matrices with the x and
y coordinates of matching 2D points in the two images and computes the 3×3 homography H that
maps pts1 to pts2. You can implement this function using direct linear transform (DLT). Report the
homography matrix. (Hint: You should implement this function on your own)
3. Your homography-fitting function from (2) will only work well if there are no mismatched features. To
make it more robust, implement a function transform ransac(pts1, pts2) that fits a homography
using RANSAC. Run the plotting code to visualize the point correspondences after applying RANSAC.
(Hint: You should implement this function on your own)
4. Write a function panoramic stitching(img1, img2) that produces a panorama from a pair of overlapping images using your functions from the previous parts. Run the algorithm on the two images
provided. Report the panorama stitched image. (Hint: You should implement this function on your
own. Use inverse mapping while stitching.)
5. Extend the algorithm to handle n=3 images, use these given images. I have also provided reference
panoramic stitched image to compare your result with. Report the all visualizations as mentioned
for n=2 case, along with two homography matrices with respect to middle image.(Hint: You should
implement this function on your own. Take the middle image as reference and find the homography
transformations with respect to middle image, that doesn’t need further transformations and also produce
the most aesthetically pleasing panorama.)
Note: You should report your results for each task. Except for task-2 provide visualizations,
for task-2 report homography matrix. For task-5 also you should report all the visualizations
along with two homography matrices with respect to middle image.
2

EE6132 Programming Assignment – 4 Structure from Motion and Multi-view Stereo

1 Problem Statement
In this assignment, you are required to implement a pipeline for dense depth reconstruction from a sequence
of images captured using a calibrated smartphone camera.
2 Tasks
1. Structure from Motion. In the shared drive folder, there is a .mat file named ‘matchedPoints.mat’.
The file contains the matched keypoints (in image coordinates) across the sequence of 25 frames. Use
these points and the camera intrinsic matrix (also provided in the shared drive), to perform bundle
adjustment and obtain the 3D points and the camera poses for the different views. To make the
optimization process simpler, use the following two assumption:
• You can make small angle approximation i.e. sin θ ≈ θ and cos θ ≈ 1. As a result of this
assumption, your 3D rotation matrix for each view can be written as follows:
Ri ≈


1 −θ
z
i
θ
y
i
θ
z
i
1 −θ
x
i
−θ
y
i
θ
x
i
1

 (1)
1
Here Θi = [θ
x
i
, θy
i
, θz
i
], is the angular displacement of the camera in the i
th view.
• Instead of solving for all the coordinates (X,Y and Z) of the 3-D points, you can parametrize
the points by their inverse depth reducing the number of unknowns and making the optimization
easier. Specifically, if (xj , yj ) is the projection of the j
th 3-D point in the reference frame1
, then the
same j
th 3-D point can be represented as Pj =
1
wj
[xj , yj , 1]T
. Here, wj =
1
zj
is the inverse-depth
of the j
th 3-D point.
The projection of 3-D point Pj in the i
th image is pij = [p
x
ij , p
y
ij ]
T
. π : R3 → R2
is the projection
function i.e. π([x, y, z]
T
) = [x/z, y/z]
T
. Correspondingly the cost function for bundle adjustment
becomes,
F =
X
Nc
i=1
X
Np
j=1
||pij − π(RiPj + Ti)||2
(2)
Here Np is the number of 3-D points to reconstruct and Nc is the number of views (or frames). You can
use your favorite optimization routine (like lsqnonlin() in MATLAB or scipy.optimize.least squares()
in python) to minimize F in 2. F in Equation 2 is minimized with respect to the pose for each view
and the inverse depth wj for each point. The steps involved in SfM are as follows:
(a) The matched key-points provided in the .mat file are in image coordinates. Write a function that
converts the matched points given in image coordinates to the normalized camera coordinates
compatible with equation 2. To do this, subtract the principal point from the matched points
provided to you and divide the difference by the focal length. For example, if one of the matched
points is mij = [mx
ij , m
y
ij ] in image coordinates, then to obtain pij = [p
x
ij , p
y
ij ] in normalized
camera coordinates, perform the following operation p
x
ij = (mx
ij −cx)/fx and p
y
ij = (m
y
ij −cy)/fy.
Here [cx, cy] is the principal point and fx and fy are the focal lengths. These can be obtained
from the intrinsic matrix provided to you.
(b) Write a projection function that takes a 3D point in world coordinate (Pj ), rotation angles (Θi),
and translation vector (Ti) and projects it to the normalized camera coordinate i.e. performs
π(RiPj + Ti).
(c) Using the projection function just defined, write a function that calculates the reprojection error
given the rotation angles, translation vector of all the views, and the inverse depth of all the 3D
points.
(d) Using an optimization routine (e.g. lsqnonlin in MATLAB or scipy.optimize.least squares in
python), minimize the reprojection error function defined above with respect to the rotation
angles (Θi) and translation (Ti) for each view and the inverse depth (wj ) of each 3D point. You
can initialize the Θi and Ti as zero vectors and wj as vector of ones. You can choose the first
frame as the reference frame.
Once you obtain the 3-D points, plot them as a 3-D point cloud. Perform this experiment for 5, 15
and 25 frames and report the point cloud obtained in each case.
2. Plane Sweep. Minimizing F in Equation 2 provided us with depths of a sparse set of 3-D points with
respect the reference frame. To obtain dense depth reconstruction, use the camera rotation matrices
and translation vectors obtained from the bundle adjustment problem of Equation 2 along with the
sequence provided in the shared folder in a plane-sweeping framework. For plane-sweeping algorithm,
one needs to find the plane-induced homography. For a plane at depth d and with normal vector n,
the plane-induced homography mapping the reference frame to the i
th frame is given as
Hd,i = K[Ri − (nT
T
i
)/d]K−1
(3)
As we are only concerned with the planes parallel to the reference frame, n = [0, 0, −1]T
. K is the
intrinsic matrix for the camera. You can use a set of 10 candidate depth planes within the minimum
1
(xj , yj ) is in normalized camera coordinates i.e. after subtraction of principle coordinate from the image coordinates and
division of the difference by the focal length in each direction.
2
and the maximum depth obtained from the above bundle adjustment problem. The steps involved in
plane sweeping are as follows:
(a) Map the i
th frame to the reference frame through the plane induced homography H
−1
d,i . Perform
this warping for each frame and each candidate depth plane. Stack the warped frames into a
tensor. This tensor will be of size H × W × D × Nc where H and W are the height and width of
the frames, D is the number of candidate depth plane and Nc is the number of views or frames.
(b) Find the variance across all the warped frames for each candidate depth. This will reduce the
tensor to a cost volume of dimension H × W × D. Find the depth for each pixel as the one that
gives the minimum variance.
Plot the obtained depth map alongside the reference frame. Perform this experiment for 5, 15 and 25
frames and report the depth map obtained in each case. How does increasing the number of frames
change the quality of the depth map reconstruction?
3