Description
CAP 5415 Programming Assignment-I Computer Vision
Question 1: Canny Edge Detection Implementation [5 pts]
In 1986, John Canny defined a set of goals for an edge detector and described an optimal method for achieving them. Canny specified three issues that an edge detector must address: • Error rate: Desired edge detection filter should find all the edges, there should not be any missing edges, and it should respond only to edge regions.
• Localization: Distance between detected edges and actual edges should be as small as possible. • Response: The edge detector should not identify multiple edge pixels where only a single edge exists. Remember from the lecture that in Canny edge detection, we will first smooth the images, then compute gradients, magnitude, and orientation of the gradient.
This procedure is followed by non-max suppression, and finally hysteresis thresholding is applied to finalize the steps. Briefly, follow the steps below for practical implementation of Canny Edge detector :
1. Read a gray scale image you can find from Berkeley Segmentation Dataset, Training images, store it as a matrix named I.
2. Create a one-dimensional Gaussian mask G to convolve with I. The standard deviation(s) of this Gaussian is a parameter to the edge detector (call it σ > 0).
3. Create a one-dimensional mask for the first derivative of the Gaussian in the x and y directions; call these Gx and Gy. The same σ > 0 value is used as in step 2. i (Fall) 2021 Computer Vision CAP 5415
4. Convolve the image I with G along the rows to give the x component image (Ix), and down the columns to give the y component image (Iy).
5. Convolve Ix with Gx to give I 0 x , the x component of I convolved with the derivative of the Gaussian, and convolve Iy with Gy to give I 0 y , y component of I convolved with the derivative of the Gaussian.
6. Compute the magnitude of the edge response by combining the x and y components. The magnitude of the result can be computed at each pixel (x, y) as: M(x, y) = q I 0 x (x, y) 2 + I 0 y (x, y) 2 .
7. Implement non-maximum suppression algorithm that we discussed in the lecture. Pixels that are not local maxima should be removed with this method. In other words, not all the pixels indicating strong magnitude are edges in fact. We need to remove false-positive edge locations from the image.
8. Apply Hysteresis thresholding to obtain final edge-map. You may use any existing library function to compute connected components if you want. Definition: Non-maximal suppression means that the center pixel, the one under consideration, must have a larger gradient magnitude than its neighbors in the gradient direction.
That is: from the center pixel, travel in the direction of the gradient until another pixel is encountered; this is the first neighbor. Now, again starting at the center pixel, travel in the direction opposite to that of the gradient until another pixel is encountered; this is the second neighbor. Moving from one of these to the other passes though the edge pixel in a direction that crosses the edge, so the gradient magnitude should be largest at the edge pixel.
Algorithmically, for each pixel p (at location x and y), you need to test whether a value M(p) is maximal in the direction θ(p). For instance, if θ(p) = pi/2, i.e., the gradient direction at p = (x, y) is downward, then M(x, y) is compared against M(x, y − 1) and M(x, y + 1), the values above and below of p. If M(p) is not larger than the values at both of those adjacent pixels, then M(p) becomes 0. For estimation of the gradient orientation, θ(p), you can simply use atan2(I 0 y , I0 x ).
Hint: It is assumed that the gradient changes continuously as a function of position, and that the gradient at the pixel coordinates are simply sampled from the continuous case. If it is further assumed that the change in the gradient between any two pixels is a linear function, then the gradient at any point between the pixels can be approximated by a linear interpolation.
For a sample output, please refer to figure below: chessboard image’s X component of the convolution with a Gaussian (a), Y component of the convolution with a Gaussian (b), X component of the image convolved with the derivative of a Gaussian (c), Y component of the image convolved with the derivative of a Gaussian (d), resulting magnitude image (e), and canny-edge image after non-maximum suppression (f) are shown. ii
Your tasks:
• Choose three example gray-scale images from Berkeley Segmentation Dataset (Training Images) CLICK HERE. When executed, your algorithm should plot intermediate and final results of Canny Edge Detection process as similar to the figure illustrated above.
• Please show the effect of σ in edge detection by choosing three different σ values when smoothing. Note that you need to indicate which σ works best as a comment in your assignment. What to submit: • Code • A short write-up about your implementation with results: 1 ) A figure showing intermediate results as in Figure above, and 2) Similar Figures showing the effect of σ.
Question 2: Convolutional Neural Network (CNN) for Classification [5 pts]
Implement ConvNET using PyTorch for digit classification. Sample code files (two files) are given in the attachment. Fill the parts indicated clearly in the code. Output should be saved as output.txt. When you are asked to include convolutional layer, do not forget to include max pooling or average pooling layer(s) as well. If you want to use any other framework, you are free to do that. Remember, no base code will be provided for any other framework.
• STEP 1: Create a fully connected (FC) hidden layer (with 100 neurons) with sigmoid activation function. Train it with SGD with a learning rate of 0.1 (a total of 60 epoch), a mini-batch size of 10, and no regularization.
• STEP 2: Now insert two convolutional layers to the network built in STEP 1 (and put pooling layer too for each convolutional layer). Pool over 2×2 regions, 40 kernels, stride =1, with kernel size of 5×5. iii
• STEP 3: For the network depicted in STEP 2, replace Sigmoid with ReLU, and train the model with new learning rate (=0.03). Re-train the system with this setting.
• STEP 4: Add another fully connected (FC) layer now (with 100 neurons) to the network built in STEP 3. (remember that the first FC was put in STEP 1, here you are putting just another FC).
• STEP 5: Change the neurons numbers in FC layers into 1000. For regularization, use Dropout (with a rate of 0.5). Train the whole system using 40 epochs. The traces from running testCNN.py for each of the 5 steps should be saved in output.txt, as indicated above.
Each step is 1 point.
What to submit: • Code • A short write-up about your implementation with results and your observations from each training. Note that in each step you will train the corresponding architecture and report the accuracy on the test data. Also show how training/test loss and accuracy is varying with each iteration during the network training using plots. iv
CAP 5415 Programming Assignment-II Computer Vision
Question 1: Nearest Neighbor Classification [5 pts]
In this question, the task is to implement nearest neighbor classifier for digit classification. You will use the digit
dataset available from sklearn library. There are around 1800 images in total with 10 digit classes, and each image
is 8×8 sized with single channel.
You will have to split the dataset into training and testing, keep 500 images for
testing (you will have to choose them randomly with 50 images per class).
Sample code to load the dataset from sklearn,
from sklearn.datasets import load_digits
digits = load_digits()
Your tasks:
• Implement a nearest neighbor classifier using pixels as features. Test the method for classification accuray.
• Implement a k-nearest neighbor classifier using pixels as features. Test the method for k=3, 5, and 7 and
compute classification accuracy.
NOTE: You can use L2-norm for distance between two samples.
What to submit:
• Code
• A short write-up about your implementation with results: 1) Accuracy scores for all the variations, 2) Compare
all the variations using accuracy scores. Comment of how the accuracy changes when you increase the value
of k.
i
Question 2: Autoencoder [5 pts]
Implement autoencoder using MNIST dataset. The input size of the images will be 28×28 with single channel. You
will implement two different variations, one with fully connected layers (standard neural network), and the other
with convolutional neural network.
Your tasks:
• Implement an autoencoder using fully connected layers. The encoder will have 2 layers (with 256, and 128
neurons) and the decoder will also have two layers (with 256 and 784 neurons). Train this network using MSE
loss for 10 epochs. Compare the number of parameters in the encoder and the decoder. Show 20 sample
reconstructed images from testing data in the report (2 image for each class) along with the original images.
• Implement a convolutional autoencoder for MNIST dataset. The encoder will have two concolutional layers,
and two max-pooling layers followed by each convolutional layers. Use kernel size 3×3, relu activation, and
padding of 1 to preserve the shape of the input feature map.
The decoder will have three convolutional layers
with kernel shape 3×3 and padding of 1 to preserve the feature map shape. The first two convolution layer will
be followed by an upsampling layer, which will double the resolution of feature maps using linear interpolation.
Train this network for 10 epochs. Compare the number of parameters in the encoder and the decoder. Also,
compare the total parameters in this autoencoder with the autoencoder in the previous task. Show 20 sample
reconstructed images from testing data in the report (2 image for each class) along with the original images.
Also compate the reconstructed results with the previous autoencoder.
NOTE: You are free to choose any optimizer, but use the same optimizer for both the variations. Feel free to use
the code shared in the first assignment for data loader and other base classes.
What to submit:
• Code
• A short write-up about your implementation with results (as indicated for each variation) and your observations
from each training.
ii
CAP 5415 Programming Assignment-III Computer Vision
Question 1: Image Classification [5 pts]
This is an extension of problem 2 from programming assignment 1. In this question your goal is to develop a CNN
classification network to recognize RGB color images. You will design your own variants of CNN architecture which
should have more than 2 convolutional layers and more than 1 fully connected layers.
You will use CIFAR-10
dataset which is available from PyTorch (torchvision.datasets.CIFAR10).
Your tasks:
• Design a CNN architecture which has more than 2 conv layers and more than 1 fully connected layers. It
should make 10 predictions for the 10 classes of CIFAR-10. Train this network on CIFAR-10 for 30 epochs
using cross-entropy loss and SGD optimizer. Report training/testing loss for each epoch in form of plots and
accuracy scores after 30 epochs. Remember you will need a softmax activation after the final fully connected
layer.
• Increase the number of conv layers in the above network and train again. Report the same numbers and plots
again comparing with the first network.
NOTE: You can use the code provided as a solution for programming assignment 1 and extend it.
What to submit:
• Code
• A short write-up about your implementation with results: 1) Accuracy scores for all the variations, 2) Compare
all the variations using accuracy scores. Comment of how the accuracy changes when you increase the number
of conv layers.
Question 2: Image segmentation [5 pts]
In this question you goal is to implement Otsu thresholding to perform image segmentation. The algorithm will be
discussed during a class lecture next week.
Your tasks:
• First implement a simple thresholding based image binarization algorithm. Plot the histogram for three
different input image. Now based on the plot, perform binarization at three different threshold levels.
• Implement a Otsu thresholding. Use the determined threshold to perform segmentation on the three input
image.
NOTE: You are free to choose any 3 images. If the images are colored, you can convert them to greyscale by
averaging the RGB values at each pixel. You can also use any library function to convert it to greyscale.
What to submit:
• Code
• A short write-up about your implementation with results (as indicated for each variation) and your observations
from each results.
For each image, you will have to show corresponding histogram and resultant segmented
image.
ii
Custom Work, Just for You!
Can’t find the tutorial you need? No worries! We create custom, original work at affordable prices! We specialize in Computer Science, Software, Mechanical, and Electrical Engineering, as well as Health Sciences, Statistics, Discrete Math, Social Sciences, Law, and English.
Custom/Original Work Essays cost as low as $10 per page.
Programming Custom Work starts from $50.
Get top-quality help now!