Description
Written Problems
1. (15 points) Give an algebraic proof that a straight line in the world projects onto a straight
line in the image. In particular
(a) Write the parametric equation of a line in three-space.
(b) Use the simplest form of the perspective projection camera from the start of the Lecture 5
notes to project points on the line into the image coordinate system. This will give you
equations for the pixel locations x and y in terms of t. Note that t will be in the
denominator.
(c) Combine the two equations to remove t and rearrange the result to show that it is in
fact a line. You should get the implicit form of the line.
(d) Finally, under what circumstances is the line a point? Show this algebraically.
2. (15 points) Let A be an m × n matrix of real values, with m ≥ n. What is the relationship
between the SVD A and the eigen decomposition of A>A? Justify your answer. You will
need to know that the eigenvalues of a matrix are unique up to a reordering of eigenvalues
and vectors, so you may assume they are provided in any order you wish. By construction,
the singular values are non-increasing. (The eigenvectors / singular vectors are unique up
to a reordering only if the eigenvalues / singular values are unique.) Justify your answer
algebraically.
1
3. (10 points Grad Only) Problem 1 includes an important over-simplification: the perspective
projection of a line does not extend infinitely in both directions. Instead, the projection of
the line terminates at what is referred to as the “vanishing point”, which may or may not
appear within the bounds of the image. Using the parametric form of a line in three-space
and the simple perspective projection model, find the equation of the vanishing point of the
line. Then, show why this point is also the intersection of all lines that are parallel to the
original line. Under what conditions is this point non-finite? Give a geometric interpretation
of these conditions.
Programming Problems
1. (30 points) This problem is about constructing a camera matrix and applying it to project
points onto an image plane. The command line of your program should simply be
python p1_camera.py params.txt points.txt
Here params.txt contains parameters that can be used to form the 3 × 4 camera matrix.
Specifically, the following ten floating point values will appear in the file on three lines:
rx ry rz
tx ty tz
f d ic jc
Here’s what these mean: Relative to the world coordinate system, the camera is rotated first
by rz degrees about the z-axis, then ry degrees about the y-axis, then rx degrees about the
x-axis. Then it is translated by vector (tx,ty,tz) millimeters. The focal length of the lens is
f millimeters, and the pixels are square with d microns on each side. The image is 4000×6000
(rows and columns) and the optical axis pierces the image plane in row ic, column jc. Use
this to form the camera matrix M. In doing so, please explicitly form the three rotations
matrices (see Lecture 05 notes) and compose them. (Note: the rotation about the z axis
will be first and therefore it is the right-most of the rotation matrices.) Overall on
this problem, be very, very careful about the meaning of each parameter and its units. The
posted example results were obtained by converting length measurements to millimeters.
Please output the 12 terms in the resulting matrix M, with one row per line. All values
should be accurate to 2 decimal places. I have provided two examples and in my example
I’ve also printed R and K, but you should not do this in your final submission.
Next, apply the camera matrix M to determine the image positions of the points in points.txt.
Each line of this file contains three floating points numbers giving the x, y and z values of a
point in the world coordinate system. Compute the image locations of the points and determine if they are inside or outside the image coordinate system. Output six numerical values
on each line: the index of the point (the first point has index 0), the x, y and z values that you
input for the point, and the row, column values. Also output on each line, the decision about
whether the point is inside or outside. (Anything with row value in the interval [0, 4000] and
column value in the interval [0, 6000] is considered inside.) For example, you might have
0: 45.1 67.1 89.1 => 3001.1 239.1 inside
1: -90.1 291.1 89.1 => -745.7 898.5 outside
2
All floating values should be accurate to just one decimal place.
One thing this problem does not address yet is whether the points are in front of or behind the
camera, and therefore are or not truly visible. Addressing this requires finding the center of
the camera and the direction of the optical axis of the camera. Any point is considered visible
if it is in front of the plane defined by the center of projection (the center of the hypothetical
lens) and the axis direction. As an example to illustrate, in our simple model that we started
with, the center of the camera is at (0, 0, 0) and the direction of the optical axis is the positive
z-axis (direction vector (0, 0, 1)) so any point with z > 0 is visible. (Note: in this case, a
point is considered “visible” even if it is not “inside” the image coordinate system.) To test
that you have solved this, as a final step, print the indices of the points that are and are not
visible, with one line of output for each. For example, you might output
visible: 0 3 5 6
hidden: 1 2 4
If there are no visible values (or no hidden values), the output should be empty after the
word visible:. This will be at the end of your output.
To summarize your required output:
(a) Matrix M (one row per line, accurate to one decimal place)
(b) Index and (x, y, z) position of input point, followed by transformed (r, c) location and
whether it’s inside the 4, 000 × 6, 000 frame
(c) Visible point indices (sorted ascending)
(d) Hidden point indices (sorted ascending)
2. (25 points) Implement the RANSAC algorithm for fitting a line to a set of points. We will
start our discussion with the command line.
python p2_ransac.py points.txt samples tau [seed]
where points.txt is a text file containing the x, y coordinates of one points per line, samples
is a positive integer indicating the number of random pairs of two points to generate, tau
is the bound on the distance from a point to a line for a point, and seed is an optional
parameter giving the seed to the random number generator.
After reading the input, if the seed is provided, your first call to a NumPy function must be
np.random.seed(seed)
otherwise, do not call the seed function. Doing this will allow us to create consistent output.
For each of samples iterations of your outer loop you must make the call
sample = np.random.randint(0, N, 2)
to generate two random indices into the points. If the two indices are equal, skip the rest of
the loop iteration (it still counts as one of the samples though). Otherwise, generate the line
and run the rest of the inner loop of RANSAC. Each time you get a new best line estimate
according to the RANSAC criteria, print out the following values, one per line, with a blank
line afterward:
3
• the sample number (from 0 up to but not including samples)
• the indices into the point array (in the order provided by randint),
• the values of a, b, c for the line (ensure c ≤ 0 and a
2 + b
2 = 1) accurate to three decimal
places, and
• the number of inliers.
At the end, output a few final statistics on the best fitting line, in particular output the
average distances of the inliers and the outliers from the line. Keep all of your output floating
point values accurate to three decimal places.
In the interest of saving you some work, I’ve not asked you to generate any plots for this
assignment, but it would not hurt for you to do so just to show yourself that things are
working ok. For similar reasons, no least-squares fit is required at the end — no need to
repeat the exercise from Lecture 04.
Here is an example execution result
python p2_ransac.py test0_in.txt 25 2.5 999 > p4_test1_out.txt
and output
Sample 0:
indices (0,28)
line (-0.983,0.184,-26.286)
inliers 13
Sample 3:
indices (27,25)
line (0.426,0.905,-4.913)
inliers 19
Sample 10:
indices (23,4)
line (0.545,0.838,-0.944)
inliers 21
avg inlier dist 0.739
avg outlier dist 8.920
3. (15 points — Students in 4270 ONLY) You are given a series of images (all in one folder)
taken of the same scene, and your problem is to simply determine which image is focused the
best. Since defocus blurring is similar to Gaussian smoothing and we know that Gaussian
smoothing reduces the magnitude of the image’s intensity gradients, our approach is simply
to find the image that has the largest average squared gradient magnitude across all images.
This is value is closely related to what is referred to as the “energy” of the image. More
specifically, this is
E(I) = 1
MN
M
X−1
i=0
N
X−1
j=0
∂I
∂x(i, j)
2
+
∂I
∂y (i, j)
2
.
Note that using the squared gradient magnitude is important here. In order to ensure
consistency across our implementations, use the two OpenCV Sobel kernels to compute the
x and y derivatives and then combine into the square gradient magnitude as in the above
equation. Specifically, the calls to the Sobel function should be
im_dx = cv2.Sobel(im, cv2.CV_32F, 1, 0)
im_dy = cv2.Sobel(im, cv2.CV_32F, 0, 1)
The command-line of your program will be
python p3_best_focus.py image_dir
where image_dir is the path to the directory that contains the images to test. Assume all
images are JPEGs with the extension .jpg (in any combination of capital and small letters).
Sort the image names using the python list sort function. Read the images as grayscale using
the built-in cv2.imread. Then output for each image the average squared gradient magnitude
across all pixels. (On each line output just the name of the image and the average squared
gradient magnitude, accurate to just one decimal place.) Finally output the name of the
best-focused image.
Here is an example:
python p3_best_focus.py evergreen
produces the output
DSC_1696.JPG: 283.9
DSC_1697.JPG: 312.7
DSC_1698.JPG: 602.4
DSC_1699.JPG: 2137.2
DSC_1700.JPG: 10224.8
DSC_1701.JPG: 18987.1
Image DSC_1701.JPG is best focused.
4. (30 points — 6270 ONLY) You are given a series of images (all the images in one folder)
taken of the same scene but with different objects in focus in different images. In some images,
the foregound objects are in focus, in others objects in the middle are in focus, and in still
others, objects far away are in focus. Here are three examples from one such series:
5
Your goal in this problem is to use these images to create a single composite image where
everything is as well focused as possible.
The key idea is to note that the blurring you see in defocused image region is similar to
Gaussian smoothing, and we know that Gaussian smoothing reduces the magnitude of the
intensity gradients. Therefore, if we look at the weighted average of the intensity gradients
in the neighborhood of a pixel, we can get a measure of image energy of the pixel. Higher
energy implies better focus.
The equation for this energy is
E(I; x, y) =
Px+k
u=x−k
Py+k
v=y−k w(u − x, v − y)
∂I
∂x (u, v)
2
+
∂I
∂y (u, v)
2
Px+k
u=x−k
Py+k
v=y−k w(u − x, v − y)
,
where w(·, ·) is a Gaussian weight function, whose standard deviation σ will be a commandline parameter. Use k = b2.5σc to define the bounds on the Gaussian. More specifically, use
cv2.GaussianBlur, let h = b2.5σc and define
ksize = (2*h+1, 2*h+1).
See our class examples from the lecture on image processing.
Note that using the squared gradient magnitude, as written above, is important here. In
order to ensure consistency with our implementation, use the two OpenCV Sobel kernels to
compute the x and y derivatives and then combine into the square gradient magnitude as in
the above equation. Specifically, the calls to the Sobel kernel should be
im_dx = cv2.Sobel(im, cv2.CV_32F, 1, 0)
im_dy = cv2.Sobel(im, cv2.CV_32F, 0, 1)
Then, after computing E(I0; x, y), . . . E(In−1; x, y) across the n images in a sequence, there
are a number of choices to combine the images into a final image. Please use
I
∗
(x, y) =
Pn−1
i=0 E(Ii
; x, y)
p
Ii(x, y)
Pn−1
i=0 E(Ii
; x, y)
p
,
where p > 0 is another command-line parameter. In other words, E(Ii
; x, y)
p becomes the
weight for a weighted averaging. (As p → ∞ this goes toward the maximum, and as p → 0
this becomes a simple average, with the energy measure having no effect (something we do
not want).) Note that a separate averaging is done at each pixel.
While this seems like a lot, OpenCV and NumPy tools make it relatively straight forward. You
will have to be careful of image boundaries (see lecture discussion of boundary conditions).
Also, this will have to work on color images, in the sense that the gradients are computed on
gray scale images, but the final image still needs to be in color.
The command-line of your program will be
python p4_composite image_dir out_img sigma p
6
where image_dir is the path to the directory that contains the images to test, out_img is
the output image name, sigma is the value of σ (assume σ > 0) for the weighting, and p > 0
is the exponent on the energy function E in the formation of the final image.
Now for the output. The most important, of course, is the final image. Make sure you write
this to the folder where the program is run (i.e. if you switch folders to get the images, switch
back before output). In addition, for pixels (M//3, N//3), (M//3, 2N//3), (2M//3, N//)
and (2M//3, 2N//3) where (M, N) is the shape of the image array, output the following:
• The value of E for each image
• The final value of I
∗ at that pixel
These should be accurate to 1 decimal place. Example results are posted with output from
the command-line
python p4_sharp_focus.py branches test02_p4_branches_combined.jpg 5.0 2
Finally, submit a pdf with a separate discussion of the results of your program, what
works well, what works poorly, and why this might be. Illustrate with examples where you
can. This part is worth 8 points toward your grade on this homework.