ELEC/COMP 447/546 Assignment 1 to 6 solutions

$149.99

Original Work ?

Download Details:

  • Name: HWs-hq7jtf.zip
  • Type: zip
  • Size: 51.25 MB

Category: Tags: , You will Instantly receive a download link upon Payment||Click Original Work Button for Custom work

Description

5/5 - (1 vote)

ELEC/COMP 447/546 Assignment 1

1.0 Basic Image Operations (10 points)
In this problem, you will gain some experience working with NumPy and OpenCV to perform
basic image manipulations.
1.1 Combining Two Images
a. Read in two large (> 256 x 256) images, A and B into your Colab notebook (see sample
Colab notebook that was shared with the class earlier).
b. Resize A to 256×256 and crop B at the center to 256×256.
c. Create a new image C such that the left half of C is the left half of A and the right half of
C is the right half of B.
d. Using a loop, create a new image D such that every odd numbered row is the
corresponding row from A and every even row is the corresponding row from B.
e. Accomplish the same task in part d without using a loop. Describe your process.
1.2 Color Spaces
a. Download the peppers image from this link. Return a binary image (only 0s and 1s), with
1s corresponding to only the yellow peppers. Do this by setting a minimum and
maximum threshold value on pixel values in the R,G,B channels. Note that you won’t be
able to perfectly capture the yellow peppers, but give your best shot!
b. While RGB is the most common color space for images, it is not the only one. For
example, one popular color space is HSV (Hue-Saturation-Value). Hue encodes color,
value encodes lightness/darkness, and saturation encodes the intensity of the color. For
a visual, see Fig. 1 of this wiki article. Convert the image to the HSV color space using
OpenCV’s cvtColor() function, and try to perform the same task by setting a threshold in
the Hue channel.
c. Add both binary images to your report. Which colorspace was easier to work with for this
task, and why?
2.0 2D Geometric Transforms (15 points)
Geometric transformations are fundamental tools used in a variety of computer vision and
computer graphics applications. In this problem, you will write your own code to warp images
using 2D geometric transforms.
2.1 Write functions to produce transformation matrices
Write separate functions that output the 3 x 3 transformation matrices for the following
transforms: translation, rotation, similarity (translation, rotation, and scale), and affine. The
functions should take as input the following arguments:
1. Translation: horizontal and vertical displacements
2. Rotation: angle
3. Similarity: angle, horizontal/vertical displacements, and scale factor (assume equal
scaling for horizontal and vertical dimensions)
4. Affine: 6 parameters
The output of each function will be a 3 x 3 matrix.
2.2 Write a function that warps an image with a given transformation matrix
Next, write a function imwarp(I, T) that warps image I with transformation matrix T. The function
should produce an output image of the same size as I. See Fig. 1 for an example of a warp
induced by a rotation transformation matrix. Make the origin of the coordinate system
correspond to the CENTER of the image, not the top-left corner. This will result in more intuitive
results, such as how the image is rotated around its center in Fig. 1.
Fig. 1: Example of an input image (left) transformed by a rotation matrix, resulting in a β€˜warped’
image (right).
Hint 1: Consider the transformation matrix T to describe the mapping from each pixel in the
output image back to the original image. By defining T in this way, you can account for each
output pixel in the warp, resulting in no β€˜holes’ in the output image (see Lec. 03 slides).
Hint 2: What happens when the transformation matrix maps an output pixel to a non-integer
location in the input image? You will need to perform bilinear interpolation to handle this
correctly (see Lec. 03 slides).
Hint 3: You may find NumPy’s meshgrid function useful to generate all pixel coordinates at
once, without a loop.
2.3 Demonstrate your warping code on two color images of your choice
For each of the two images, show 2-3 transformations of each type (translation, rotation,
similarity, affine) in your report.
3.0 Cameras (15 points)
3.1 Camera Matrix Computation
a. Calculate the camera intrinsic matrix K, extrinsic matrix E, and full rank 4 ⨉ 4 projection
matrix P = KE for the following scenario with a pinhole camera:
β—‹ The camera is rotated 90 degrees around the x-axis, and is located at (1, 0, 2)
𝑇
.
β—‹ The focal lengths 𝑓π‘₯, 𝑓𝑦 are 100.
β—‹ The principal point (𝑐π‘₯, 𝑐𝑦)
𝑇
is (25, 25).
b. For the above defined projection, find the world point in inhomogeneous coordinates π‘₯𝑀
which corresponds to the projected homogeneous point in image space π‘₯𝐼 =
(25, 50, 1, 0.25)
𝑇
.
3.2 Field of view and focal length
You are given two cameras with the exact same sensor. The first camera has a wider field-ofview (FOV) than the second, with all other camera parameters being the same. Which camera
has a shorter focal length and why?
4.0 Relighting (10 points) (ELEC/COMP 546 ONLY)
In this problem, you will perform a simple version of image relighting, the task of changing the
lighting on a scene. To do this experiment, you will need two light sources (such as ceiling
lights, floor lamps, flashlights etc.) and a couple of scene objects. Set up a static scene similar
to the one shown in Fig. 2 (the light sources do not have to be seen in the frame, but try to have
them illuminating the scene at two different angles), and a camera such that it is stationary
throughout the experiment (cell phone leaning against heavy object or wall is fine). Let us label
the two lamps as LAMP1 and LAMP2.
a. Capture the image of the scene by turning on LAMP1 only (image I1). Now capture an
image by turning on LAMP2 only (image I2). Finally, capture the image with both LAMP1
and LAMP2 on (image I12). Load and display these images into your Colab notebook.
b. Now, you will create a synthetic photo (I12_synth) depicting the scene when both of the
lamps are turned on by simply summing I1 and I2 together: I12_synth = I1 + I2. Also
compute an image depicting the difference between the synthetic and real images: D =
I12_synth – I12.
c. In your report, show I1, I2, I12, I12_synth, and D side by side. When displaying D, make
sure to rescale D’s values to fill the full available dynamic range ([0,1] for float, or [0,255]
for uint8). You can do this with the following operation:
Fig. 2: Example setup with two lamps.
(D – min(D))/(max(D) – min(D)).
d. How good is your synthetic image compared to the real one? Where do they differ the
most?
Submission Instructions
All code must be written using Google Colab (see course website). Every student must submit a
zip file for this assignment in Canvas with 2 items:
1. An organized report submitted as a PDF document. The report should contain all image
results (intermediate and final), and answer any questions asked in this document. It
should also contain any issues (problems encountered, surprises) you may have found
as you solved the problems. The heading of the PDF file should contain:
a. Your name and Net ID.
b. Names of anyone you collaborated with on this assignment.
c. A link to your Colab notebook (remember to change permissions on your
notebook to allow viewers).
2. A pdf copy of your Colab notebook.
Collaboration Policy
I encourage collaboration both inside and outside class. You may talk to other students for
general ideas and concepts, but you should write your own code, answer questions
independently, and submit your own work.
Plagiarism
Plagiarism of any form will not be tolerated. You are expected to credit all sources explicitly. If
you have any doubts regarding what is and is not plagiarism, talk to me.

ELEC/COMP 447/546 Assignment 2

1.0 Hybrid Images (10 points)
Recall the hybrid image of Albert Einstein and Marilyn Monroe introduced in [1] and
reproduced below in Fig. 1. Due to the way your brain processes spatial frequencies, you
will see the identity of the image change if you squint or move farther/closer to the image.
In this problem, you will create your own hybrid image.
Fig. 1: The hybrid image β€œMarilyn Einstein.” This was produced by combining the lowfrequency components of an image of Marilyn Monroe, and the high-frequency
components of an image of Albert Einstein [1].
1.1 Gaussian kernel
Implement function gaussian2D(sigma, kernel_size) that returns a 2D gaussian
blur kernel, with input argument sigma specifying a tuple of x,y scales (standard
deviations), and kernel_size specifying a tuple of x,y dimensions of the kernel. The
kernel should be large enough to include 3 standard deviations per dimension.
1.2 Create Hybrid Images
Choose two images (A and B) of your choice that you will blend with one another and
convert them to grayscale (you can use opencv.cvtColor). These images can be of faces,
or any other objects. Try to make the objects in the two images occupy roughly the same
region in the image (if they don’t you can use the imwarp function you wrote in Homework
1 to manually align them!).
Construct a hybrid image C from A (to be seen close-up) and B (to be seen far away) as
follows: C = blur(B) + (A-blur(A)), where blur is a function that lowpass filters
the image (use the Gaussian kernel you coded in 1.1 for this). Try different values of
sigma for the Gaussian kernel. How does the amount of blurring affect your perception
of the results? In your report, please show your input images labeled clearly as A and B,
and attach the result C for a value of sigma which you feel demonstrates the illusion the
best, at both the original size and a downsampled size. As a sanity check, you should be
able to see the identity change when looking at the original and downsampled versions.
1.3 Fourier Spectra
For the sigma value you chose in 1.2, show images of the Fourier spectra magnitudes
of images A, B, blur(B), A-blur(A), and C. You can get the magnitude of the Fourier
spectrum coefficients of an image β€˜x’ by running:
X = numpy.abs(numpy.fftshift(numpy.fft2(x)))
By default, numpy.fft2 will place the zero frequency (DC component) of the spectrum
at the top left of the image, and so numpy.fftshift is used here to place the zero
frequency at the center of the image. When displaying the Fourier spectrum with
matplotlib.pyplot.imshow, the image will likely look black. This is because the DC
component typically has a much higher magnitude than all other frequencies, such that
after rescaling all values to lie in [0,1], most of the image is close to 0. To overcome this,
display the logarithm of the values instead.
2.0 Laplacian Blending (15 points)
The Laplacian pyramid is a useful tool for many computer vision and image processing
applications. One such application is blending sections of different images together, as
shown in Fig. 2. In this problem, you will write code that constructs a Laplacian pyramid,
and use it to blend two images of your choice together.
Fig. 2: Laplacian Blending Example. The halves of two images, an orange and an apple (left)
are blended together using Laplacian blending (right).
2.1 Gaussian Pyramid
Write a function gausspyr(I, n_levels, sigma)that returns a Gaussian pyramid
for image I with number of levels n_levels and Gaussian kernel scale sigma. The
function should return a list of images, with element i corresponding to level i of the
pyramid. Note that level 0 should correspond to the original image I, and level n_levels
– 1 should correspond to the coarsest (lowest frequency) image.
2.2 Laplacian Pyramid
Write a function lappyr(I, n_levels, sigma)that returns a Laplacian pyramid for
image I with number of levels n_levels and Gaussian kernel sigma. The function
should return a list of images, with element i corresponding to level i of the pyramid.
Note that level 0 corresponds to the details of the original image I, and level n_levels
– 1 corresponds to the low-frequency residual image.
2.3 Image Blending
Choose two images A and B depicting different objects and resize them to the same
shape. You may want to use your imwarp function from Homework 1 to align the
scales/orientations of the objects appropriately (as was done in the example in Fig. 2) so
that the resulting blend will be most convincing. Create a binary mask image mask which
will have 1s in its left half, and 0s in its right half (called a β€˜step’ function). Perform blending
with the following operations:
1. Build Laplacian pyramids for A and B.
2. Build a Gaussian pyramid for mask.
3. Build a blended Laplacian pyramid for output image C using pyramids of A, B, and mask,
where each level π‘™π‘˜
𝐢
is defined by the equation π‘™π‘˜
𝐢 = π‘™π‘˜
𝐴
βˆ— π‘šπ‘˜ + π‘™π‘˜
𝐡
βˆ— (1 βˆ’ π‘šπ‘˜).
4. Invert the combined Laplacian pyramid back into an output image C.
Show the following in your report: (1) Images from all levels of the Laplacian pyramids for
A and B, (2) images from all levels of the Gaussian pyramid for mask, and (3) your final
blended image C.
2.4 (ELEC/COMP 546 Only) Blending two images with a mask other than a step
Laplacian blending is not restricted to only combining halves of two images using a step
mask. You can set the mask to any arbitrary function and merge images, as shown in this
example. Demonstrate a Laplacian blend of two new images using a mask other than
step.
3.0 Pulse Estimation from Video (5 points)
You are convinced that your friend Alice is a robot. You don’t have much evidence to
prove this because she is quite a convincing human during conversations, except for the
fact that she does get very angry if water touches her. One day, you hit upon a plan to
figure out this mystery once and for all. You know that a human has a heart which pumps
blood, and a robot does not. Furthermore, you read a paper [2] showing that one can
estimate heart rate from a video of a human face using very simple computer vision
techniques. So the next day, you convince Alice to take this video of herself, linked here.
You will now need to implement a simple pulse estimation algorithm and run it on the
video. Follow these steps:
3.1 Read video into notebook and define regions of interest
Upload the video into your Colab environment. Note that it may take several minutes for
the upload to complete due to the size of the file. You can then read the video frames into
a numpy array using the read_video_into_numpy function provided here.
Using the first video frame, manually define rectangles (row and column boundaries) that
capture 1) one of the cheeks and 2) the forehead.
3.3 Compute signals
Now compute the average Green value of pixels for all frames for each facial region
(cheek, forehead). This gives a 1D signal in time called the Photoplethysmogram (PPG)
for each region.
3.4 Bandpass filter
It is often useful to filter a signal to a particular band of frequencies of interest (β€˜pass
band’) if we know that other frequencies don’t matter. In this application, we know that a
normal resting heart rate for an adult ranges between 60-100 beats per minute (1-1.7 Hz).
Apply the bandpass_filter function to your signals provided here. You can set
low_cutoff = 0.8, high_cutoff = 3, fs = 30, order = 1. Plot the filtered signals.
3.4 Plot Fourier spectra
Plot the Fourier magnitudes of these two signals using the DFT, where the x-axis is
frequency (in Hertz) and y-axis is amplitude. DFT coefficients are ordered in terms of
integer indices, so you will have to convert the indices into Hertz. For each index n = [-
N/2, N/2], the corresponding frequency is Fs * n / N, where N is the length of your signal
and Fs is the sampling rate of the signal (30 Hz in this case). You can use numpy.fftfreq
to do this conversion for you.
3.5 Estimate Alice’s average pulse rate
A normal resting heart rate for adults ranges between 60-100 beats per minute. What rate
does the highest peak in Alice’s Fourier spectrum correspond to? Which facial region
provides the cleanest spectrum (the one which has the clearest single peak and low
energy elsewhere)? Is Alice likely a human or not?
3.6 (ELEC/COMP 546 Only) Find your own pulse
Take a 15-20 second video of yourself using a smartphone, webcam, or personal camera.
Your face should be as still as possible, and don’t change facial expressions. Do a similar
analysis above as you did with Alice’s video. Show some frames from your video. Was it
easier/harder to estimate heart rate compared to the sample video we provided? What
was challenging about it?
References
[1] Oliva, Aude, Antonio Torralba, and Philippe G. Schyns. “Hybrid images.” ACM Transactions
on Graphics (TOG) 25.3 (2006): 527-532.
[2] Poh, Ming-Zher, Daniel J. McDuff, and Rosalind W. Picard. “Non-contact, automated cardiac
pulse measurements using video imaging and blind source separation.” Optics express 18.10
(2010): 10762-10774.
Submission Instructions
All code must be written using Google Colab (see course website). Every student must submit a
zip file for this assignment in Canvas with 2 items:
1. An organized report submitted as a PDF document. The report should contain all image
results (intermediate and final), and answer any questions asked in this document. It
should also contain any issues (problems encountered, surprises) you may have found
as you solved the problems. The heading of the PDF file should contain:
a. Your name and Net ID.
b. Names of anyone you collaborated with on this assignment.
c. A link to your Colab notebook (remember to change permissions on your
notebook to allow viewers).
2. A pdf copy of your Colab notebook.
Collaboration Policy
I encourage collaboration both inside and outside class. You may talk to other students for
general ideas and concepts, but you should write your own code, answer questions
independently, and submit your own work.
Plagiarism
Plagiarism of any form will not be tolerated. You are expected to credit all sources explicitly. If
you have any doubts regarding what is and is not plagiarism, talk to me.

ELEC/COMP 447/546 Assignment 3

1.0 Optical Flow
In this problem, you will implement both the Lucas-Kanade and Horn-Schunck
algorithms. Your implementations should use a Gaussian pyramid to properly account
for large displacements. You can use your pyramid code from Homework 2, or you may
simply use Opencv’s pyrDown function to perform the blur+downsampling. You may
also use Opencv’s Sobel filter to obtain spatial (x,y) gradients of an image.
1.1 Lucas-Kanade (5 points)
Implement the Lucas-Kanade algorithm, and demonstrate tracking points on this video.
1. Select corners from the first frame using the Harris corner detector. You can use
this command: corners = cv.cornerHarris(gray_img,2,3,0.04).
2. Track the points through the entire video by applying Lucas-Kanade between
each pair of successive frames. This will yield one β€˜trajectory’ per point, with
length equal to the number of video frames.
3. Create a gif showing the tracked points overlaid as circles on the original frames.
You can draw a circle on an image using cv.circle. You can save a gif with this
code:
import imageio
imageio.mimsave(‘tracking.gif’, im_list, fps=10)
where im_list is a list of your output images. You can open this gif in your web
browser to play it as a video and visualize your results. Show a few frames of
the gif in your report, and save the gif in your Google Drive, and place the
link to it in your report. Make sure to allow view access to the file!
4. Answer the following questions:
a. Do you notice any inaccuracies in the point tracking? Where and why?
b. How does the tracking change when you change the local window size
used in Lucas-Kanade?
1.2 Horn-Schunck (5 points)
Implement the Horn-Schunck algorithm. Display the flow fields for the β€˜Army,’
β€˜Backyard,’ and β€˜Mequon’ test cases from the Middlebury dataset, located here.
Consider β€˜frame10.png’ as the first frame, and β€˜frame11.png’ as the second frame for all
cases.
Use this code to display each predicted flow field as a colored image:
hsv = np.zeros(im.shape, dtype=np.uint8)
hsv[…, 1] = 255
mag, ang = cv.cartToPolar(flow_x, flow_y)
hsv[…, 0] = ang * 180 / np.pi / 2
hsv[…, 2] = cv.normalize(mag, None, 0, 255, cv.NORM_MINMAX)
out = cv.cvtColor(hsv, cv.COLOR_HSV2RGB)
1.3 ELEC/COMP 546 Only: Improving Horn-Schunck with superpixels (5 points)
Recall superpixels discussed in lecture and described further in this paper. How might
you use superpixels to improve the performance of Horn-Schunck? Can you incorporate
your idea into the smoothness + brightness constancy objection function? Define any
notation you wish to use in the equation. You don’t have to implement your idea in code
for this question.
2.0 Image Compression with PCA
In this problem, you will use PCA to compress images, by encoding small patches in
low-dimensional subspaces. Download these two images:
Test Image 1
Test Image 2
Do the following steps for each image separately.
2.1 Use PCA to model patches (5 points)
Randomly sample at least 1,000 16 x 16 patches from the image. Flatten those patches
into vectors (should be of size 16 x 16 x 3). Run PCA on these patches to obtain a set
of principal components. Please write your own code to perform PCA. You may use
numpy.linalg.eigh, or numpy.linalg.svd to obtain eigenvectors.
Display the first 36 principal components as 16 x 16 images, arranged in a 6 x 6 grid
(Note: remember to sort your eigenvalues and eigenvectors by decreasing eigenvalue
magnitude!). Also report the % of variance captured by all principal components (not
just the first 36) in a plot, with the x-axis being the component number, and y-axis being
the % of variance explained by that component.
2.2 Compress the image (5 points)
Show image reconstruction results using 1, 3, 10, 50, and 100 principal components. To
do this, divide the image into non-overlapping 16 x 16 patches, and reconstruct each
patch independently using the principal components. Answer the following questions:
1. Was one image easier to compress than another? If so, why do you think that is
the case?
2. What are some similarities and differences between the principal components for
the two images, and your interpretation for the reason behind them?
Submission Instructions
All code must be written using Google Colab (see course website). Every student must
submit a zip file for this assignment in Canvas with 2 items:
1. An organized report submitted as a PDF document. The report should contain all
image results (intermediate and final), and answer any questions asked in this
document. It should also contain any issues (problems encountered, surprises)
you may have found as you solved the problems. The heading of the PDF file
should contain:
a. Your name and Net ID.
b. Names of anyone you collaborated with on this assignment.
c. A link to your Colab notebook (remember to change permissions on your
notebook to allow viewers).
2. A pdf copy of your Colab notebook.
Collaboration Policy
I encourage collaboration both inside and outside class. You may talk to other students
for general ideas and concepts, but you should write your own code, answer questions
independently, and submit your own work.
Plagiarism
Plagiarism of any form will not be tolerated. You are expected to credit all sources
explicitly. If you have any doubts regarding what is and is not plagiarism, talk to me.

ELEC/COMP 447/546 Assignment 4

Introduction
This assignment will introduce you to PyTorch and neural networks. We have provided a
Colab notebook located here with skeleton code to get you started. Colab comes with a
free GPU. To activate the GPU during your session, click Runtime on the top toolbar,
followed by Change runtime type, and select GPU under hardware accelerator. You will
find the GPU useful for quickly training your neural network in Problem 2.
1.0 PyTorch (10 points)
In this problem, you will perform some basic operations in PyTorch, including creating
tensors, moving arrays between PyTorch and Numpy, and using autograd. Before
starting, please read the following pages:
1. Tensor basics
2. Autograd basics, these two sections:
a. Differentiation in Autograd
b. Vector Calculus using autograd
1.1 Basics of Autograd (5 points)
a. In the provided notebook, fill in the function sin_taylor() with code to
approximate the value of the sine function using the Taylor approximation
(defined here). You can use numpy.math.factorial()to help you.
b. Create a tensor x with value πœ‹/4 . Create a new tensor y = sin_taylor(x).
Use y.backward() to evaluate the gradient of y at x. Is this value a close
approximation to the exact derivative of sine at x?
c. Now, create a NumPy array x_npy of 100 random numbers drawn uniformly
from [βˆ’πœ‹, πœ‹] (use np.random.uniform). Create a tensor x from that array and
place the tensor onto the GPU. Again, evaluate y = sin_taylor(x). This
time, y is a vector. If you run y.backward(), it will throw an error because
autograd is meant to evaluate the derivative of a scalar output with respect to
input vectors (see tutorial pages above). Instead, run either one of these two
lines (they do the same thing):
y.sum().backward()
y.backward(gradient=torch.ones(100))
What is happening here? We are creating a β€˜dummy’ scalar output (let’s call it z),
which contains the sum of values in y, and acts as the final scalar output of our
computation graph. Due to the chain rule of differentiation, dz/dx will yield the
same value as dy/dx.
d. Get the gradient tensor dz/dx and convert that tensor to a Numpy array. Plot
dz/dx vs. x_npy, overlaid on a cosine curve. Confirm that the points fall on the
curve and put this plot in your report.
1.2 Image Denoising (5 points)
In this problem, you will denoise this noisy parrot image, which we denote I. To do so,
you will create a denoising loss function, and use autograd to optimize the pixels of a
new image J, which will be a denoised version of I.
a. In your Colab notebook, implement denoising_loss() to compute the
following loss function:
π‘™π‘œπ‘ π‘  = ‖𝐼 βˆ’ 𝐽‖1 + 𝛼 (β€–
𝑑𝐽
𝑑π‘₯ β€–
1
+ β€–
𝑑𝐽
𝑑𝑦 β€–
1
)
The first component is a data term making sure that the predicted image J is not
too far from the original image I. The second term is a regularizer which will
reward J if it is smoother, quantified using J’s spatial derivatives. We have
provided you a function get_spatial_gradients() to compute the gradients.
b. Implement gradient descent to optimize the pixels of J using your loss function
and autograd. Initialize J to be a copy of I. Try different values for the learning
rate and 𝛼 and find a combination that does a good job. Put the smoothed image
J, along with the learning rate and 𝛼 you used in your report.
c. ELEC/COMP 546 Only: Change the loss function to use L2 norms instead of L1.
Does it work better or worse? Why?
Hints (you should use all of these in your solution):
β€’ torch.clone: performs a deep copy of a tensor
β€’ To require storing gradients for a tensor x, use: x.requires_grad_(True)
β€’ In the code, you will see the statement with torch.no_grad():. Any
statements written within that block will not update the computation graph. Put
your gradient descent step within that block, since that operation should not
update the graph.
β€’ Remember to zero out the gradient buffer of J after each step using
J.grad.zero_().
β€’ Remember to normalize the gradient to a unit vector before using it in your
gradient descent step.
β€’ To plot an image in tensor J using matplotlib, you will have to first detach it from
the computation graph (to not track its gradients), move it from the GPU to the
CPU, and convert to a NumPy array. You can do this in one line with:
J = J.detach().cpu().numpy().
2.0 Training an image classifier (10 points)
In this problem, you will create and train your first neural network image classifier!
Before starting this question, please read the following pages about training neural
networks in PyTorch:
1. Data loading
2. Models
3. Training loop
We will be using the CIFAR10 dataset, consisting of 60,000 images of 10 common
classes. Each image is of size 32 x 32 x 3. Download the full dataset as one .npz file
here, and add it to your Google Drive. This file contains three objects: X: array of
images, y: array of labels (specified as integers in [0,9]), and label_names: list of class
names. Please complete the following:
a. Finish implementing the CIFARDataset class. See comments in the code for
further instructions.
b. Add transforms: RandomHorizontalFlip, RandomAffine ([-5, 5] degree
range, [0.8, 1.2] scale range) and ColorJitter ([0.8, 1.2] brightness range,
[0.8, 1.2] saturation range). Don’t forget to apply the ToTensor transform first,
which converts a H x W x 3 image to a 3 x H x W tensor, and normalizes the
pixel range to [0,1]. You will find the transform APIs in this page.
c. Implement a CNN classifier with the structure in the following table. You will find
the APIs for Conv, Linear, ReLU, and MaxPool in this page. The spatial
dimensions of an image should NOT change after a Conv operation (only after
Maxpooling).
Layer Output channels for Conv/output neurons for Linear
3 x 3 Conv + ReLU 50
MaxPool (2 x 2) X
3 x 3 Conv + ReLU 100
MaxPool (2 x 2) X
3 x 3 Conv + ReLU 100
MaxPool (2 x 2) X
3 x 3 Conv + ReLU 100
Linear + ReLU 100
Linear 10
d. Implement the training loop.
e. Train your classifier for 15 epochs. The GPU, if accessible, will result in faster
training. Make sure to save a model checkpoint at the end of each epoch, as you
will use them in part f. Use the following training settings: batch size = 64,
optimizer = Adam, learning rate = 1e-4.
f. Compute validation loss per epoch and plot it. Which model will you choose and
why?
g. Run the best model on your test set and report:
i. Overall accuracy (# of examples correctly classified / # of examples)
ii. Accuracy per class
iii. Confusion matrix: A 10 x 10 table, where the cell at row i and column j
reports the fraction of times an example of class i was labeled by your
model as class j. Please label the rows/columns by the object class name,
not indices.
iv. For the class on which your model has the worst accuracy (part ii), what is
the other class it is most confused with? Show 5-10 test images that your
model confused between these classes and comment on what factors
may have caused the poor performance.
h. ELEC/COMP 546 Only: Change the last two Conv blocks in the architecture to
Residual blocks and report overall accuracy of the best model. Recall that a
residual block has the form:
Submission Instructions
All code must be written using Google Colab (see course website). Every student must
submit a zip file for this assignment in Canvas with 2 items:
1. An organized report submitted as a PDF document. The report should contain all
image results (intermediate and final), and answer any questions asked in this
document. It should also contain any issues (problems encountered, surprises)
you may have found as you solved the problems. Please add a caption for
every image specifying what problem number it is addressing and what it is
showing. The heading of the PDF file should contain:
a. Your name and Net ID.
b. Names of anyone you collaborated with on this assignment.
c. A link to your Colab notebook (remember to change permissions on your
notebook to allow viewers).
2. A pdf copy of your Colab notebook.
Collaboration Policy
I encourage collaboration both inside and outside class. You may talk to other students
for general ideas and concepts, but you should write your own code, answer questions
independently, and submit your own work.
Plagiarism
Plagiarism of any form will not be tolerated. You are expected to credit all sources
explicitly. If you have any doubts regarding what is and is not plagiarism, talk to me.

ELEC/COMP 447/546 Assignment 5

Problem 1: Semantic Segmentation (7 points)
In this problem, you will train a simple semantic segmentation network. Recall that in
semantic segmentation, the algorithm must assign each pixel of an input image to one
of K object classes. We have provided you with a Colab notebook with skeleton code to
get you started.
We will use a portion of the CityScapes dataset for this problem, consisting of 2975
training images and 500 validation images. The second cell in the notebook will
automatically download the dataset into your local Colab environment.
Each image also comes with annotations for 34 object classes in the form of a
segmentation image (with suffix β€˜labelIds.png’). The segmentation image contains
integer ids in [0, 33] indicating the class of each pixel. This page provides the mappings
from id to label name.
a. Fill in the init and forward functions for the Segmenter class, which will
implement your segmentation network. The network will be a convolutional
encoder-decoder. The encoder will consist of the first several β€˜blocks’ of layers
extracted from the VGG16 network pretrained on ImageNet (the provided Colab
notebook extracts these layers for you). You must implement the decoder with
this form:
Layer Output channels for Conv
3 x 3 Conv + ReLU 64
Upsample (2 x 2) X
3 x 3 Conv + ReLU 64
Upsample (2 x 2) X
3 x 3 Conv + ReLU 64
Upsample (2 x 2) X
3 x 3 Conv n_classes (input to init)
Use PyTorch’s Upsample function. Remember that the size of the image should not
change after each Conv operation (add appropriate padding).
b. Train your model for 7 epochs using the nn.CrossEntropy loss function. Using the
GPU, this should take about 30 minutes.
c. Using the final model, report the average intersection-over-union (IoU) per class
on the validation set in a table. For more on IoU, see this page. Which class has
the best IoU, and which has the worst? Comment on why you think certain
classes have better accuracies than others, and what factors may cause those
differences.
d. For each of the following validation images, show three images side-by-side: the
image, the ground truth segmentation, and your predicted segmentation. The
segmentation images should be in color, with each class represented by a
different color.
i. frankfurt_000000_015389_leftImg8bit.jpg
ii. frankfurt_000001_057954_leftImg8bit.jpg
iii. lindau_000037_000019_leftImg8bit.jpg
iv. munster_000173_000019_leftImg8bit.jpg
e. Look at the lines of code for resizing the images and masks to 256 x 256. We use
bilinear interpolation when resizing the image, but nearest neighbor interpolation when
resizing the mask. Why do we not use bilinear interpolation for the mask?
f. Look at the __getitem__ function for the CityScapesDataset class and notice that we
apply a horizontal flip augmentation to the image and mask using a random number
generator. Why do we apply the flip in this way instead of simply adding
T.RandomHorizontalFlip to the sequence of transforms in im_transform and
mask_transform (similar to what you did in Homework 4)?
Submission Instructions
All code must be written using Google Colab (see course website). Every student must
submit a zip file for this assignment in Canvas with 2 items:
1. An organized report submitted as a PDF document. The report should contain all
image results (intermediate and final), and answer any questions asked in this
document. It should also contain any issues (problems encountered, surprises)
you may have found as you solved the problems. Please add a caption for
every image specifying what problem number it is addressing and what it is
showing. The heading of the PDF file should contain:
1. Your name and Net ID.
2. Names of anyone you collaborated with on this assignment.
3. A link to your Colab notebook (remember to change permissions on your
notebook to allow viewers).
2. A pdf copy of your Colab notebook.
Collaboration Policy
I encourage collaboration both inside and outside class. You may talk to other students
for general ideas and concepts, but you should write your own code, answer questions
independently, and submit your own work.
Plagiarism
Plagiarism of any form will not be tolerated. You are expected to credit all sources
explicitly. If you have any doubts regarding what is and is not plagiarism, talk to me.

ELEC/COMP 447/546 Assignment 6

Problem 1: StyleGAN (7 points)
In this problem, you will use StyleGAN2 for controlled image generation. Make sure to
run the first 3 code cells of the provided Colab notebook. The first cell installs
StyleGAN2 and its dependencies. The second cell loads a pre-trained StyleGAN2
model for faces. The third cell provides you with some useful utility functions. Some
preliminary code on how to generate synthetic faces using the utility functions is also
provided in cell 4.
StyleGAN2’s generator converts a vector 𝑧 ∈ 𝑅
512 drawn from the standard Normal
distribution into a β€˜style’ vector 𝑀 ∈ 𝑅
512. The generator then processes the style vector
to produce an image 𝐼 ∈ 𝑅
1024Γ—1024Γ—3
. In this problem, you will find a direction in the
style space corresponding to perceived gender and use that direction to alter the
perceived gender of synthetic faces.
a. Interpolating between images: Choose two random noise vectors z0, and z1,
such that the two generated faces have different perceived genders based on the
face_is_female function1
. This function uses a pre-trained face gender
classifier to make its prediction.
i. Interpolate between the latent vectors z0, and z1 with 5 intermediate
points. Show a strip of 7 faces along with the classifier predictions in your
report.
ii. Interpolate between the style vectors w0, and w1 with 5 intermediate
points. Show a strip of 7 faces along with the classifier predictions in your
report.
iii. Question: What differences do you notice when interpolating in latent
space versus style space? Do the intermediate faces look realistic?
b. Image manipulation with latent space traversals
i. Sample 1000 random z vectors, convert them to style vectors w, and get
their corresponding perceived genders using the trained classifier. This
may take a few minutes.
1 We use binary gender attributes in this assignment for simplicity.
ii. Train a linear classifier (use scikit-learn’s linear SVM) that predicts gender
from the style vector. The model’s coefficients (attribute coef_) specify
the normal vector to the hyperplane used to separate the perceived
genders in style space. Remember to convert your cuda tensors to numpy
arrays before sending to scikit-learn’s functions.
iii. Sample 2 random w vectors. For each w vector, display a strip of 5
images. The center image will be the image generated by w. The two
images to the left will correspond to moving toward the β€œmore male”
direction, and the two to the right will correspond to β€œmore female”. To
generate the latter 4 images, move along the SVM hyperplane’s normal
vector in both directions using some appropriate step size.
iv. Question: Do you notice any facial attributes that seem to commonly
change when moving between males and females? Why do you think that
occurs?
Problem 2: Using CLIP for Zero-Shot Classification (5 points)
In this problem, you will use Contrastive Language-Image Pre-Training (CLIP) to
perform zero-shot classification of images. You can read more about CLIP in this blog
post, and check out the example in the official GitHub repository. We will reuse the
CIFAR dataset introduced in Assignment 4. Download that dataset as one .npz file here
and place it in your Google Drive folder.
a. Perform classification of each test image (last 10,000 images of the dataset)
using CLIP. To do so, create 10 different captions (e.g., β€œAn image of a [class]”)
corresponding to each of the 10 object classes. Then, for each image, store the
label that provides the highest probability score. Report overall accuracy.
b. ELEC/COMP 546 ONLY (3 points). Engineer the caption prompts to try to obtain
better accuracy. To do so, give a set of possible captions per class instead of
just one. For example, β€œA bad photo of a [class]” or β€œA drawing of a [class]”.
Report your accuracy.
Submission Instructions
All code must be written using Google Colab (see course website). Every student must
submit a zip file for this assignment in Canvas with 2 items:
1. An organized report submitted as a PDF document. The report should contain all
image results (intermediate and final), and answer any questions asked in this
document. It should also contain any issues (problems encountered, surprises)
you may have found as you solved the problems. Please add a caption for
every image specifying what problem number it is addressing and what it is
showing. The heading of the PDF file should contain:
1. Your name and Net ID.
2. Names of anyone you collaborated with on this assignment.
3. A link to your Colab notebook (remember to change permissions on your
notebook to allow viewers).
2. A pdf copy of your Colab notebook.
Collaboration Policy
I encourage collaboration both inside and outside class. You may talk to other students
for general ideas and concepts, but you should write your own code, answer questions
independently, and submit your own work.
Plagiarism
Plagiarism of any form will not be tolerated. You are expected to credit all sources
explicitly. If you have any doubts regarding what is and is not plagiarism, talk to me.