ECSE 415 Assignment 1 to 4 solutions

$100.00

Original Work ?
Category: You will Instantly receive a download link for .ZIP solution file upon Payment

Description

5/5 - (1 vote)

ECSE 415 Assignment 1: Image Filtering

1 Thresholding (12 Points)
Thresholding is the simplest method of image segmentation. From a grayscale
image, thresholding can be used to create binary images. Here, each pixel in an
image is replaced with a foreground label (i.e. a white pixel with 255 value) if
the image intensity Ii,j is satisfies some pre-defined condition (Ex. if Ii,j > T),
or with a background label (i.e. a black pixel with 0 value) otherwise.
Simple Binary Thresholding:
Si,j =

255 if Ii,j > T; (1)
0 otherwise. (2)
Inverse Binary Thresholding:
Si,j =

255 if Ii,j < T; (3)
0 otherwise. (4)
Window Binary Thresholding:
Si,j =

255 if T1 > Ii,j > T2; (5)
0 otherwise. (6)
You are given an image named ”numbers.jpg” (Figure 1(a)) which contains
multiple different multi-digit numbers. Your task is to threshold the image using
the pre-defined conditions for thresholding defined above.
Note that you are not allowed to use the openCV cv2.threshold function
for this task. Instead implement thresholding using basic python or numpy
functions.
1. Threshold the image at three different thresholds 1) 55 2) 90 and 3) 150 using simple binary thresholding and inverse binary thresholding as defined
above. (3 points)
2. Write your observations about thresholded images at different thresholds.
How many and which numbers are segmented at each threshold? (A number is considered as segmented if all digits of that number are considered
as foreground in the thresholded image) What else do you observe at each
threshold? (3 points)
3. Threshold the image using Window binary thresholding using three different range of thresholds. 1) T1=55 and T2=90, 2) T1=90 and T2=150,
3) T1=55 and T2=150. Write your observations. How many and which
numbers are segmented at each threshold? (3 points)
2
(a) (b)
Figure 1: (a) Input image for thresholding, (b) Example of output image of
thresholding. Note that only numbers ”123” and ”549” are segmented (foreground pixels).
4. In a practical application, we vary the value of the hyper-parameters (here,
the threshold values) for any of the above mentioned thresholding methods, such that we get the desired output. Find a threshold value such
that only numbers ”123” and ”549” are segmented (i.e. considered as
foreground – white pixel – 255 value). See Figure 1(b). Report your finding for at least three different threshold values, and write how it helped
you in narrowing down the desired hyper-parameter value. (3 points)
2 Denoising (18 Points)
You are given a clean image named ‘lighthouse’ (Figure 2(a)) and an image
corrupted by additive white Gaussian noise (Figure 2(b)). You are allowed to
use OpenCV/Scikit-learn functions for this section.
Apply the following filtering operations:
1. Filter the noisy image using a 5 × 5 Gaussian filter with variance equal to
2. (3 points)
2. Filter the noisy image using a box filter of the same size. (3 points)
3. Compare the Peak-Signal-to-Noise-Ratio (PSNR) of both of the denoised
images to that of the clean image and state which method gives the superior result. (Use the PSNR function provided by opencv) (3 points)
You are also given an image corrupted by salt and pepper noise (Figure 2(c)).
Apply the following filtering operations:
4. Filter the noisy image using the same Gaussian filter as used in the previous question. (3 points)
5. Filter the noisy image using a median filter of the same size. (3 points)
3
(a) (b) (c)
Figure 2: Input images for denoising. (a) clean image (b) image corrupted with
Gaussian noise (c) image corrupted with salt and pepper noise.
6. Compare the PSNR of both of the denoised images to that of the clean
image and state which method gives a better result. (3 points)
3 Sobel edge detector (16 Points)
In this question, you will assess the effect of varying the kernel size on the
results of an edge detection algorithm. You will detect edges in a clean image
named, ‘cameraman’ (Figure 3(a)). You are allowed to use OpenCV/Scikit-learn
(a) (b)
Figure 3: Input image for edge detection. (a) clean image (b) image corrupted
with Gaussian noise.
4
functions for this section.
• Apply a Sobel edge detector with the kernel size of 3×3, 5×5 and 7×7 to
the image. Threshold the filtered image to detect edges. Use two values
of thresholds: 10% and 20% of the maximum pixel value in the filtered
image. (4 points)
• Comment on the effect of filter size on the output. (2 points)
Next, you will evaluate the effect of denoising prior to edge detection. For
the following questions, you will use noisy image as shown in Figure 3(b).
• Apply a Sobel edge detector with the kernel size of 3 × 3. Threshold the
filtered image to detect edges. Use two values of thresholds: 10% and 20%
of the maximum pixel value in the filtered image. (4 points)
• Denoise the image with a 3 × 3 box filter and then apply the same Sobel
edge detector, with the same values of the thresholds, from the previous
question. (4 points)
• Comment on the effectiveness of using denoising prior to edge detection.
(2 points)
(a) (b)
Figure 4: (a) Input image for canny edge detection (b) expected output.
4 Canny Edge Detection (12 Points)
For this section, experiments will be performed on the ’dolphin.jpg’ (Figure
4(a)) image.
1. Briefly describe the 4 main steps of Canny edge detection. (2 points)
2. As you saw in Tutorial-2, the 3 main hyperparameters of Canny Edge
detection are the Gaussian Smoothing Kernel size (K), and the Lower (L)
and Higher (H) Thresholds used for Hysteresis. In this section, we will
5
observe the effect of changing these hyperparameters. You will experiment on 3 different values for all 3 parameters (K = 5,9,13, L = 10,30,50,
H = 100, 150,200). Vary the values of each hyper-parameter and keep
other hyper-parameters constant. Do this procedure for all combination of
hyper-parameters mentioned above. This should results in total 27 triplets
of hyper-parameters. E.g. (K,L,U) = (5,10,100), (5,10,150), (9,10,200),
…. Use canny edge detection (cv2.GaussianBlur and cv2.Canny) for each
of these triplets. (4 points)
3. Comment on how changing values of each hyper-parameters (K,L,U) effects the overall edge detection. Is there is any relationship between any
hyper-parameters? (3 points)
4. Find a value of each hyper-parameter such that only dolphin edges are
detected. (Figure 4(b)) (3 points)
(a)
(b)
Figure 5: (a) checkerboard input image and expected harris corner output (red
dots represents detecteed harris corners) (b) shapes image.
5 Harris Corner Detection (12 points)
Implement the Harris corner detector as described in class (lecture-5 slide-48
and tutorial-2) using numpy (5 points). This has the following steps:
1. Compute Image derivatives (optionally, blur first)
2. Compute Square of derivatives
3. Apply Gaussian Filtering on the output of step-2
6
4. Get Cornerness function response (Determinant(H)-kTrace(H)2), where
k=0.05. (You can vary value of k for your application)
5. Perform non-maxima suppression (as in the Canny edge detector)
You will apply Harris Corner Detector for three different images:
1. Checkerboard image Figure 5(a) (Input image). Change value of threshold
to get detected corners similar to Figure (a) (Harris Corner). Observe and
report affect of changing threshold values (3 points)
2. Shape image Figure 5(b). Try different value of thresholds and report
your observations. (2 points)
3. Take any one face image from Google Face thumbnail collection dataset.
Apply harris corner detector on this image. Try different value of thresholds and report your observations. (2 points)
7

ECSE 415 Assignment 2: Image Matching and Face Detection

1 Invariance of SIFT Features (34 Points)
You are given a reference image of a book as shown in Figure 1. Verify the
invariance of SIFT features under changes in image scale and rotation.
1.1 Invariance Under Changes in Scale
1. Compute SIFT keypoints for the reference image. (2 points)
2. Scale reference image using scaling factors of (0.2, 0.5, 0.8, 1.25, 2, 5). (2
points)
3. Compute SIFT keypoints for the transformed images. (2 points)
4. Match all keypoints of the reference image to the transformed images using
a brute-force method. (2 points)
5. Sort matching keypoints according to the matching distance. (2 points)
6. Display top ten matched keypoints for each pair of reference image and a
transformed image. (2 points)
7. Plot the matching distance for top 100 matched keypoints. Plot indices
of keypoints on x-axis and corresponding matching distance on y-axis. (2
points)
8. Discuss the trend in the plotted results. What is the effect of increasing
the scale on the matching distance? Reason the cause. (3 points)
2
1.2 Invariance Under Rotation
1. Compute SIFT keypoints for the reference image. (2 points)
2. Rotate reference image at the angle of (10, 30, 90, 150, 170, 180). (2
points)
3. Compute SIFT keypoints for the transformed images. (2 points)
4. Match all keypoints of the reference image to the transformed images using
a brute-force method. (2 points)
5. Sort matching keypoints according to the matching distance. (2 points)
6. Display top ten matched keypoints for each pair of reference image and a
transformed image. (2 points)
7. Plot the matching distance for top 100 matched keypoints. Plot indices
of keypoints on x-axis and corresponding matching distance on y-axis. (2
points)
8. Discuss the trend in the plotted results. What is the effect of increasing
the angle of rotation on the matching distance? Reason the cause. (3
points)
2 Matching using SIFT – Book Reveal (16 Points)
You are given an image of the reference book taken under different acquisition
conditions: (a) under occlusions (Figure 2(a)) and (2) under different lighting
conditions (Figure 2(b)). The task is to transform the image of the book in
Figure 2(a) and align and merge it with the occluded view in (Figure 2(b) to
generate an image with an unoccluded view of the book (See Figure 2(c)). To
achieve this objective, please perform following steps:
1. Find SIFT keypoints in given input images. (2 points)
(a) (b) (c)
Figure 2: Input and desired output for image manipulation task (a) image of an
occluded book (book occlusion.jpg) (b) reference image of a book in different
lighting condition (book crop.jpg) (c) desired output.
3
Figure 3: Examples of CelebA dataset images.
2. Match keypoints of reference image to the keypoints of the occluded image
using brute-force method. (2 points)
3. Sort matching keypoints according to the matching distance. (2 points)
4. Display top ten matching keypoints. (2 points)
5. Compute a homography to align the images using RANSAC method and
apply the transformation on the reference image. (6 points)
6. Paste transformed reference image on the occluded view to generate unoccluded view as shown in Figure 2(c). (2 points)
3 Face detection (35 Points)
In this question, you will work on the task of Face detection. For this purpose,
you will explore two different algorithms for this purpose: (1) EigenFaces and
(2) Viola-Jones detector.
We will use a publicly available celebA face dataset (Figure 3). A subset
of 1000 images is given with the assignment. From these images, choose any
random 100 images as your training dataset.
3.1 Eigenface Representation
Produce eigenface representation for your training data through PCA. Please
note that you are not allowed to use the in-built PCA function in OpenCV/ScikitLearn. You should implement Snapshot method for PCA (covered in class –
4
(a) (b)
Figure 4: Face Detection: (a) Group Image and (b) example of detected faces
with bounding boxes around the detected faces.
Lecture 8 – Slide 55) from scratch using numpy (15 points) 1
. Display first
6 eigenfaces (3 points)
3.2 Face Detection
You will now detect all the faces in a group image (Figure 4 (a)). Use a sliding
window to detect the faces. We will follow PCA base detection algorithm covered in class (Lecture 8 – Slide 63). Set a threshold on the distance in eigenspace
between the window contents and your training data. Try different values of
thresholds and use the one which gives you good results. Display your image
with bounding boxes around detected faces for your best threshold (ex. Figure
4 (b)) 2
(10 points). How well does the method work? How many false positive
face detections do you get? (2 points).
Use an existing implementation of the Viola-Jones face detector, and compare the results with your detector (e.g. how many false positives do you obtain?). Under what conditions would you expect the Viola-Jones detector to
work when PCA does not? (5 points)
1You are allowed to use numpy.linalg for computing eigenvalues and eigenvectors
2You can use any online available code for bounding box generation. Please cite the source
for this in your report.
5

Introduction to Computer Vision (ECSE 415) Assignment 3

1 Image Classification using RF and SVM
For this task, you are given a dataset of flower images1
. The dataset contains
images of 9 types of flowers. You can read the images and the corresponding
labels as follows.
1The dataset is derived from the 102-Category Flower dataset[1].
1
train images = np.load(‘flower subset.npz’)[‘train images’]
train labels = np.load(‘flower subset.npz’)[‘train labels’]
test images = np.load(‘flower subset.npz’)[‘test images’]
test labels = np.load(‘flower subset.npz’)[‘test labels’]
The arrays train images and test images are stacks of 1556 and 90 grayscale images of size 128×128, respectively.
• Resize the train/test images to 64 × 64 and compute HoG features using
cells of 8×8 pixels, blocks of 4×4 cells and 4 bins. This should yeild a
feature vector of size 1600 per image. (3 points)
(Suggestion: Make a function which takes list of images as arguments and
delivers list of HoG features as output. The same function can be used for
train and test set.)
• Fit a non-linear SVM classifier (use RBF kernel with gamma=‘auto’ and
C=1) on the features and the class labels of the training images. (1
points)
• Predict labels of the test images by feeding the test features to the trained
classifier and calculate classification accuracy. (2 points)
• Tune values of hyperparameters ‘gamma’ and ‘C’ to achieve test accuracy
greater than 25%. (2 points)
• Fit a Random Forest(RF) classifier (set n estimators=10, max depth=5
and criterion=‘entropy’) on the features and the class labels of the training
images. (1 points)
• Predict labels of the test images by feeding the test features to the trained
classifier and calculate classification accuracy. (2 points)
• Tune values of hyperparameters ‘n estimators’ and ‘max depth’ to achieve
test accuracy greater than 25%. (2 points)
• Compare results of SVM and RF classifiers. Which one provides better
results? Experiment training both classifiers with a range of random stats
and measure classification accuracy of the test set. Which classifier is
more stable or robust to the change in random state? (3 points)
2 Image Classification with Convolution Neural
Network (CNN).
In this part, you will classify MNIST digits [2] into 10 categories using a CNN.
You may chose to run the code on GPU.
1. Use Pytorch class torchvision.datasets.MNIST to (down)load the dataset.
Use batch size of 32. (3 points)
2
2. Implement a CNN with the layers mentioned below. (5 points)
• A convolution layer with 32 kernels of size 3×3.
• A ReLU activation.
• A convolution layer with 64 kernels of size 3×3.
• A ReLU activation.
• A maxpool layer with kernels of size 2×2.
• A convolution layer with 64 kernels of size 3×3.
• A ReLU activation.
• A convolution layer with 64 kernels of size 3×3.
• A ReLU activation.
• A flattening layer. (This layer resizes 2D feature map to a feature
vector. The length of this feature vector should be 4096.)
• A Linear layer with output size of 10.
(Suggestion: you can start with the code from Tutorial 6 and adapt it for
the current problem.)
3. Create an instance of SGD optimizer with learning rate of 0.001. Use
the default setting for rest of the hyperparameters. Create an instance of
categorical cross entropy criterion. (1 point)
4. Train the CNN for 10 epochs. (5 points)
5. Predicts labels of the test images using the above trained CNN. Measure
and display classification accuracy. (3 points)
References
[1] Nilsback, Maria-Elena, and Andrew Zisserman. ”Automated flower classification over a large number of classes.” 2008 Sixth Indian Conference on
Computer Vision, Graphics & Image Processing. IEEE, 2008.
[2] Y. LeCun, L. Bottou, Y. Bengio, and P. Haffner. ”Gradient-based learning
applied to document recognition.” Proceedings of the IEEE, 86(11):2278-
2324, November 1998.
3

Introduction to Computer Vision (ECSE 415) Assignment 4

1 Image Segmentation using K-means
Implement K-means algorithm using only the numpy library. You can use
opencv and matplotlib libraries only to read and display images. Apply Kmeans to the images ‘home’ and ‘flower’ shown in Figure 1. Try K=2 and K=3.
Run the algorithm for 10 iterations and display the resulting segmented images
in each case. (10 points)
1
(a) (b)
Figure 1: Segment above images using K-means algorithm.
(a) (b) (c)
Figure 2: A pair of stereo images (a) left image (b) right image (c) disparity
map (expected output).
2 Disparity
In this section, we will compute disparity map D from a pair of stereo images
captured using parallel cameras. The images are shown in Figure 2(a) and 2(b).
We will solve correspondence problem with the window search algorithm. Refer
to slides 58-59 in Lecture 18 – Stereo Vision. Instead of searching for a matching
window on the entire scanline, we will restrict the search on a small region on
the scanline.
1. Extract a 5 × 5 window centered at each pixel-location (i, j)L in the left
image. Let’s call these windows reference windows. (2 points)
2. For each reference window in the left image do the following.
(a) On the right scanline, create a search region bounded by pixel-locations
(i, j−47)R and (i, j)R. Extract 5×5 windows centered at every pixellocation in this search region. (2 points)
(b) For few boarder pixel-locations either the reference window or the
search region lie outside the boundary of the image. Set disparity
2
(a) (b)
Figure 3: Input frames for optical flow computation. (a) frame1 (b) frame2.
D(i, j) = 48 for these pixel-locations. For the remaining locations do
the following. (2 points)
(c) Compute sum-of-square-difference(SSD) between the windows in the
search region and the reference window. (2 points)
(d) Find a location (i
0
, j0
)R with minimum SSD and compute disparity
D(i, j) = jL − j
0
R. (Note that 0 ≤ D(i, j) ≤ 47 as the search region
contains 48 pixel-locations.) (1 point)
3. Display the final disparity map D with the cmap argument of plt.imshow
set to ‘gray r’. The expected output is shown in Figure 2(c). (1 point)
3 Optical Flow
In this section, we will observe the effect of the window-size on the prediction
accuracy of optical flow. The input frames are shown in Figure 3(a-b) and
the ground-truth flow in given in ‘flow10.npz’ file. Read ground truth flow as
follows: gt = np.load(‘flow10.npz’)[‘flow’]
1. Use calcOpticalFlowFarneback from OpenCV to compute optical flow
between the input frames with the arguments set as follows. (2 points)
• flow=None, pyr scale=0.5, levels=3, iterations=3, poly n=5,
poly sigma=1.2 and flags=0.
• Very winsize from 5 to 21 in the steps of 2.
2. For each setting of winsize, measure mean squared error(MSE) between
estimated optical flow and the ground truth optical flow. Plot MSE (yaxis) vs winsize (x-axis). (2 points)
3. Do you observe any trend in the plot above? Does the error increase or
decrease with increasing window-size? Explain the effect of window-size
on the prediction error. (2 points)
3