Name: CSE 803 Computer Vision: Homework 1 solution
SKU: 30849
Price: 29.99 USD
Availability: InStock

Description

5/5 - (4 votes)

1 Camera projection Matrix [30 pts]
Study the Projection and Dolly Zoom notebook (dolly_zoom.py or at1 )and finish the following tasks.
(a) Write a function rotY() which takes an angle theta (in radian) and outputs the 3D rotation matrix of
rotating by theta about the y-axis (right-hand rule). You may refer to this Wikipedia entry: https:
//en.wikipedia.org/wiki/Rotation matrix#Basic rotations After you are done, refer to the starter code to
generate and submit cube.gif of a cube rotating around itself. (5 pts)
(b) Similarly, write a function rotX() which rotates about the x-axis. Let θ = π/4, consider the following
two transformations:
(a) rotX(theta), followed by rotY(theta)
(b) rotY(theta), followed by rotX(theta)
Using renderCube() in the same way, plot the resulting view of the cube from two transformations.
Are 3D rotation matrices commutative? (5 pts)
(c) Combine rotX() and rotY(), choose an appropriate order and a pair of parameters so that renderCube() draws
a projection of the cube where one diagonal of the cube is projected to a single point, as shown
in Figure 1 (left). Report the order and parameters you choose. (10 pts)
Hint: The diagonal of the cube rotates together with the cube. When it projects to a single point in
2D, it is horizontal and perpendicular to the camera plane in 3D. You can either make a mathematical
calculation or perform a numerical search.
(d) Implement an orthographic camera by either adding a branch to function projectLines(), or refer to it
and write a new one. Then plot the same rotated cube in the previous part with this orthographic
camera. It should look like Figure 1 (right). (10 pts)
Figure 1: The diagonal of a cube projected to a single point
1 https://colab.research.google.com/drive/1LlOwsp5zCV-dIfZRlmWMJfbeArUTclQo
3
·
2 Prokudin-Gorskii: Color from grayscale photographs [50 pts]
In this part, you are tasked with implementing the dream of Russian photographer, Sergei Mikhailovich
Prokudin-Gorskii (1863-1944), via a project invented by Russian-American vision researcher, Alexei A.
Efros (1975-present). Sergei was a visionary who believed in a future with color photography (which we
now take for granted). During his lifetime, he traveled across the Russian Empire taking photographs
through custom color filters at the whim of the czar. To capture color for the photographers of the future,
he decided to take three separate black-and-white pictures of every scene, each with a red, green, or blue
color filter in front of the camera. His hope was that you, as a student in the future, would come along and
produce beautiful color images by combining his 3 separate, filtered images.
Task 1: Combine (5 pts) We will provide you with a folder of Prokudin-Gorskii’s black-and-white
(grayscale) image composites (prokudin-gorskii/ in the assignment zip). Each composite (alter- natively
triple-framed image or triptych) contains three grayscale photos preserved from the early 1900s. The
composite looks like a three panel vertical comic strip, with each grayscale photo in the composite
positioned vertically above one another. These photos represent images captured with a blue, green, and
red filter. Choose a single composite from this folder (your favorite) and write a program in Python that
takes the three grayscale panels and simply stacks them across the third color channel dimension to
produce a single, colored image. We expect this stacked photo to look wonky and unaligned- fixing this is
what you will do in Task 2. Make sure to save your images as RGB instead of BGR and include them in
your report.
Specifically: Write a function that loads a grayscale tripled-framed image from prokudin-gorskii/ with
something like plt.imread(), chops it vertically into thirds, then saves a color image with each third as the
correct color channel. Save the output colored image in your report.
Task 2: Alignment (25 pts) As you will have noticed, the photos are misaligned due to inadvertent
jostling of the camera between each shot. Your second task is to fix this. You need to search over possible
pixel offsets in the range of [-15, 15] to find the best alignment for the different R, G, and B channels.
The simplest way is to keep one channel fixed, say R, and align the G and B channels to it by searching
over the offset range both horizontally and vertically. Pick the alignment that maximizes a similarity
metric (of your choice) between the channels. One such measure is dot product, i.e, R G. Another is
normalized cross- correlation, which is simply the dot product between the L2 normalized R and G
vectors. After writing this function, run it on all of the images in prokudin-gorskii/ and also on ’efros
tableau.jpg’, so Professor Efros can have his photo restored to color. Include these aligned images and the
offsets in your report. For full credit, your report needs to include properly aligned images – find a
similarity metric that will accomplish this.
Specifically: Write a function to align the 3 channels of the image produced by Task 1. This function
should output the (x,y) offsets required for shifting two of the color channels with respect to third. The
third channel might then have a (0,0) offset. Save the newly aligned images from prokudin-gorskii/ and
’efros tableau.jpg’ in your report, along with the offsets for each color channel. Report the similarity
metric you choose.
Hint:To offset the channels while keeping the same dimensions among them, you can use either np.roll()
to roll over boundaries, or np.pad() for padding.
4
Task 3: Pyramid (20 pts) For very large offsets (and high-resolution images), comparing all the alignments for a broad range of displacements (e.g. [-30, 30]) can be computationally intensive. We will have
you implement a recursive version of your algorithm that starts by estimating an image’s alignment on a
low-resolution version of itself, before refining it on higher resolutions. To implement this, you will build
a two-level image pyramid. To do this, you must first scale the triple-frame images down by a factor of 2
(both the width and height should end up halved). Starting with your shrunk, coarse images, execute your
alignment from Task 2 over the following range of offsets [-15, 15]. Choose the best alignment based on
your similarity metric and treat it as the new current alignment. Then in the full resolution images, use
this new current alignment as a starting place to again run the alignment from Task 2 in a small range [-
15, 15]. Run this Pyramid task on the ’seoul tableau.jpg’ and ’vancouver tableau.jpg’ images. If your
course project goes well.
Specifically: Use cv2.resize() to shrink each image in the triptych. Use your code from Task 2 to align
them and get the intermediate offset. Shift the original images accordingly and align again at full
resolution. Report the intermediate offset (at the coarse resolution), the next offset at the full resolution,
and what the overall total offset was that includes both of these. Save the aligned images in color in your
report.
Hint: If you’re struggling, use a different color channel as your anchor!
Report You must submit a report that includes the offsets, color output images, and description required
above. Your description should be such that a reader could implement what you’ve done after readingyour
report.
3. Color Spaces and illuminance [20 pts]
The same color may look different under different lighting conditions. Images indoor.png and outdoor.png are two photos of a
same Rubik’s cube under different illuminances2
.
1. Load the images and plot their R, G, B channels separately as grayscale images using plt.imshow() (beware of
normalization). Then convert them into LAB color space using cv2.cvtColor() and plot the three channels again. Include the
plots in your report. (5 pts)
2. How do you know the illuminance change is better separated in LAB color space? (5 pts)
3. Choose two different lighting conditions and take two photos of a non-specular object. Try to make the same color look
as different as possible (a large distance on AB plane in LAB space). Below is an example of two photos of the same piece
of paper, taken in the basement and by the window respectively.
Submit the two images with names im1.jpg and im2.jpg, both cropped and scaled to 256X256. Under the same folder, also
submit a file info.txt that contains two lines: Line 1 contains four integers x1,y1,x2,y2 where we will take a 32X32 patch
around the coordinate on each image and compare colors. (You can use plt.imshow() and plt.show() to bring up a window
where you can select pixel with coordinates.) Line 2 is a description of the lighting conditions that you choose. An example of
this file is provided for you in the folder. (10 pts)
2 The two images are taken from this blog post: https://www.learnopencv.com/color-spaces-in-opencv-cpp-python/
5
Figure 2: The same piece of paper in the basement and by the window
The problems are prepared by Dr. David Fouhey and Dr. Alexei Efros.
References
https://en.wikipedia.org/wiki/Camera matrix
https://en.wikipedia.org/wiki/Rotation matrix
http://inst.eecs.berkeley.edu/~cs194-26/fa18/hw/proj1/
https://sites.google.com/a/umich.edu/eecs442-winter2015/homework/color

Color spaces in OpenCV (C++ / Python)

CSE 803 Computer Vision: Homework 1 solution

Description

Related products

CSE 803 Computer Vision: Homework 4 solution

CSE 803 Computer Vision: Homework 2 solution

CSE 803 Computer Vision: Homework 3 solution