Description
As we discussed in class, photometric stereo is a computational imaging method to determine
the shape of an object from its appearance under a set of lighting directions. In the course of
this homework, we will solve a simplified version of the photometric stereo problem. We will
make certain assumptions: that the object is Lambertian and is imaged with an orthographic
camera under a set of directional lights. We will first solve calibrated photometric stereo, in
which lighting directions are given; then uncalibrated photometric stereo, where we have to
infer lighting directions and the shape of the object jointly from the given images. In this
process, we will learn about an ambiguity in the resultant shape introduced because of this
joint estimation.
Even though the homework is fairly self-contained, we strongly recommend going over the
papers in the references before starting. As always, start early.
Instructions
1. Integrity and collaboration: You are encouraged to work in groups, but everyone
must submit their own work. If you work as a group, include the names of your
collaborators in your write-up. Code should NOT be shared or copied. Please DO
NOT use external code in this homework. Plagiarism is strongly prohibited and may
lead to failure of this course.
2. Questions: If you have any questions, please look on Piazza first. Other students
may have encountered the same problem, and it may have been solved already. If not,
post your question publicly. Teaching staff will respond as soon as possible. You are
also strongly encouraged to answer your classmates’ questions on Piazza to streamline
the process.
3. Write-up: Please type out your answers to theory questions and include experimental results and discussions into a document <andrew-id>.pdf (preferably in LATEX).
Please note that we DO NOT accept handwritten scans for your write-up in this
homework.
4. Code: Running your code is expected to produce the figures asked for in the questions. Expect your code to be auto-graded. This requires you to stick to the function
signatures in the starter code. Read the docstrings in the code to understand more
about inputs and expected outputs. Remember to comment your code. In case the
autograder fails, we will go through your code to see if we can give you partial credit.
It’ll be of help to us to even roughly understand what each block of code does. If you
need to create helper functions, create them in the file where you need them. Make
sure you use relative paths in your code as opposed to absolute paths.
5. Submission: Create the following directory structure:
• <andrew-id>/
– <andrew-id>.pdf
– src/
∗ q1.py
∗ q2.py
Zip up the folder into <andrew-id>.zip such that unzipping recreates <andrew-id>/.
Upload the zip to Gradescope.
1 Calibrated photometric stereo (80 points)
Figure 1: 3D reconstruction result from calibrated photometric stereo
All code in this part goes in src/q1.py. Outside the function definitions, write code to
pipeline these functions and generate the results asked for in this question.
Image formation. We will now look at image formation under our specific assumptions.
We will understand and implement the process that occurs in a scene when we take a picture
with an orthographic camera.
(a, 5 points) Understanding n-dot-l lighting. In your write-up, explain the geometry of
the n-dot-l lighting model from Fig. 2a. Where does the dot product come from? Where does
projected area (Fig. 2b) come into the equation? Why does the viewing direction not matter?
՜
𝑙
𝑑𝐴
՜
՜ 𝑣
𝑛
(a) Geometry of photometric stereo (b) Projected area
Figure 2: Geometry of photometric stereo
Figure 3: Setup for Problem 1b
(b, 10 points) Rendering n-dot-l lighting. Consider a uniform fully reflective Lambertian sphere with its center at the origin and a radius of 0.75 cm (Fig 3). An orthographic
camera located at (0, 0, 10) cm looks towards the negative z-axis with its sensor axes aligned
and centered on the x- and y-axes. The pixels on the camera are squares 7 µm in size, and
the resolution of the camera is 3840 × 2160 pixels. Simulate the appearance of the sphere
under the n-dot-l model with directional light sources with incoming lighting directions (1,
1, 1)/√
3, (1, -1, 1)/√
3 and (-1, -1, 1)/ √
3 in the function renderNDotLSphere (individual
images for all three lighting directions). Note that your rendering isn’t required to be absolutely radiometrically accurate: we need only evaluate the n-dot-l model. Include the 3
renderings in your write-up.
Inverting the image formation model. Armed with this knowledge, we will now invert
the image formation process given the lighting directions. Seven images of a face lit from
different directions are given to us, along with the ground-truth directions of light sources.
These images are taken at lighting directions such that there is not a lot of occlusion or
normals being directed opposite to the lighting, so we will ignore these effects. The images
are in the data/ directory, named as input n.tif for the n
th image. The source directions
are given in the file data/sources.npy.
(c, 5 points) Loading data. In the function loadData, read the images into Python.
Convert the RGB images into the XYZ color space and extract the luminance channel. Vectorize these luminance images and stack them in a 7 × P matrix, where P is the number of
pixels in each image. This is the matrix I, which is given to us by the camera. Next, load
the sources file and convert it to a 3 × 7 matrix L. For this question, include a screenshot of
your function loadData
These images are 16-bit .tifs, and it’s important to preserve that bit depth while reading
the images for good results. Therefore, make sure that the datatype of your images is uint16
after loading them.
(d, 10 points) Initials. Recall that in general, we also need to consider the reflectance, or
albedo, of the surface we’re reconstructing. We find it convenient to group the normals and
albedos (both of which are a property of only the surface) into a pseudonormal b = a · n,
where a is the scalar albedo. We then have the relation I = L
TB, where the 3 × P matrix
B is the set of pseudonormals in the images. With this model, explain why the rank of I
should be 3. Perform a singular value decomposition of I and report the singular values in
your write-up. Do the singular values agree with the rank-3 requirement? Explain this result.
(e, 20 points) Estimating pseudonormals. Since we have more measurements (7 per
pixel) than variables (3 per pixel), we will estimate the pseudonormals in a least-squares
sense. Note that there is a linear relation between I and B through L: therefore, we can
write a linear system of the form Ax = y out of the relation I = L
TB and solve it to get B.
Solve this linear system in the function estimatePseudonormalsCalibrated. Estimate perpixel albedos and normals from this matrix in estimateAlbedosNormals. In your write-up,
mention how you constructed the matrix A and the vector y.
Note that here matrices you end up creating might be huge and might not fit in your computer’s memory. In that case, to prevent your computer from freezing or crashing completely,
make sure to use the sparse module from scipy. You might also want to use the sparse
linear system solver in the same module.
(f, 10 points) Albedos and normals. Note that the albedos are the magnitudes of the
pseudonormals by definition. Calculate the albedos, reshape them into the original size of
the images and display the resulting image in the function displayAlbedosNormals. Include
the image in your write-up and comment on any unusual or unnatural features you may find
in the albedo image, and on why they might be happening. Make sure to display in the gray
colormap.
The per-pixel normals can be viewed as an RGB image. Reshape the estimated normals into
an image with 3 channels and display it in the function displayAlbedoNormals. Include
this image in the write-up. Do the normals match your expectation of the curvature of the
face? Make sure to display in the rainbow colormap.
(g, 5 points) Normals and depth. We will now estimate from the normals the actual
shape of the face. Represent the shape of the face as a 3D depth map given by a function
z = f(x, y). Let the normal at the point (x, y) be n = (n1, n2, n3). Explain, in your write-up,
why n is related to the partial derivatives of f at (x, y): fx = ∂f(x, y)/∂x = −n1/n3 and
fy = ∂f(x, y)/∂y = −n2/n3. You may consider the 1D case where z = f(x).
Normal integration. Given that the normals represent the derivatives of the depth map,
we will integrate the normals obtained in (g). We will use a special case of the FrankotChellappa algorithm for normal integration, as given in [1]. You can read the paper for the
general version of the normal integration algorithm.
(h, 5 points) Understanding integrability of gradients. Consider the 2D, discrete
function g on space given by the matrix below. Find its x and y gradient, given that
gradients are calculated as gx(xi
, yj ) = g(xi+1, yj ) − g(xi
, yj ) for all i, j (and similarly for y).
Let us define (0, 0) as the top left, with x going in the horizontal direction and y in the
vertical.
g =
1 2 3 4
5 6 7 8
9 10 11 12
13 14 15 16
(1)
Note that we can reconstruct the entire of g given the values at its boundaries using gx and
gy. Given that g(0, 0) = 1, perform these two procedures:
1. Use gx to construct the first row of g, then use gy to construct the rest of g;
2. Use gy to construct the first column of g, then use gx to construct the rest of g.
Are these the same?
Note that these were two ways to reconstruct g from its gradients. Given arbitrary gx and
gy, these two procedures will not give the same answer, and therefore this pair of gradients
does not correspond to a true surface. Integrability implies that the value of g estimated in
both these ways (or any other way you can think of) is the same. How can we modify the
gradients you calculated above to make gx and gy non-integrable? Why may the gradients
estimated in the way of (g) be non-integrable? Note all this down in your write-up.
The Frankot-Chellappa algorithm for estimating shapes from their gradient first projects the
(possibly non-integrable) gradients obtained in (h) above onto the set of all integrable gradients. The resulting (integrable) projection can then be used to solve for the depth map. The
function integrateFrankot in utils.py implements this algorithm given the two surface
gradients.
(i, 10 points) Shape estimation. Write a function estimateShape to apply the FrankotChellappa algorithm to your estimated normals. Once you have the function f(x, y), plot
it as a surface in the function plotSurface and include some significant viewpoints in your
write-up. The 3D projection from mpl toolkits.mplot3D.Axes3D along with the function
plot surface might be of help. Make sure to plot this in the coolwarm colormap.
2 Uncalibrated photometric stereo (50 points)
Figure 4: 3D reconstruction result from uncalibrated photometric stereo
We will now put aside the lighting direction and estimate shape directly from the images of
the face. As before, load the lights into L0 and images in the matrix I. We will not use the
matrix L0 for anything except as a comparison against our estimate in this part. All code
for this question goes in src/q2.py.
(a, 10 points) Uncalibrated normal estimation. Recall the relation I = L
TB. Here, we
know neither L nor B. Therefore, this is a matrix factorization problem with the constraint
that with the estimated Lˆ and Bˆ , the rank of ˆI = LˆTBˆ be 3 (as you answered in the previous
question), and the estimated ˆI and Lˆ have appropriate dimensions.
It is well-known that the best rank-k approximation to a m × n matrix M, where k ≤
min{m, n} is calculated as the following: perform a singular value decomposition SVD
M = UΣVT
, set all singular values except the top k from Σ to 0 to get the matrix Σˆ ,
and reconstitute Mˆ = UΣVˆ T
. Explain in your write-up how this can be used to construct
a factorization of the form detailed above following the required constraints.
(b, 10 points) Calculation and visualization. With your method, estimate the pseudonormals Bˆ in estimatePseudonormalsUncalibrated and visualize the resultant albedos and
normals in the gray and rainbow colormaps respectively. Include these in the write-up.
(c, 5 points) Comparing to ground truth lighting. In your write-up, compare the Lˆ
estimated by the factorization above to the ground truth lighting directions given in L0. Are
they similar? Unless a special choice of factorization is made, they will be different. Describe
a simple change to the procedure in (a) that changes the Lˆ and Bˆ , but keeps the images
rendered using them the same using only the matrices you calculated during the singular
value decomposition (U/S/V). (No need to code anything for this part, just describe the
change in the write-up)
(d, 5 points) Reconstructing the shape, attempt 1. Use the given implementation of
the Frankot-Chellappa algorithm from the previous question to reconstruct a 3D depth map
and visualize it as a surface in the ‘coolwarm’ colormap as in the previous question. Does
this look like a face?
Enforcing integrability explicitly. The ambiguity you demonstrated in (c) can be (partially) resolved by explicitly imposing integrability on the pseudonormals recovered from the
factorization. We will follow the approach in [2] to transform our estimated pseudonormals
into a set of pseudonormals that are integrable. The function enforceIntegrability in
utils.py implements this process.
(e, 5 points) Reconstructing the shape, attempt 2. Input your pseudonormals into
the enforceIntegrability function, use the output pseudonormals to estimate the shape
with the Frankot-Chellappa algorithm and plot a surface as in the previous questions. Does
this surface look like the one output by calibrated photometric stereo? Include at least three
viewpoints of the surface and your answers in your write-up.
The generalized bas-relief ambiguity. Unfortunately, the procedure in (e) resolves the
ambiguity only up to a matrix of the form
G =
1 0 0
0 1 0
µ ν λ
(2)
where λ > 0, as shown in [3]. This means that for any µ, ν and λ > 0, if B is a set of
integrable pseudonormals producing a given appearance, G−TB is another set of integrable
pseudonormals producing the same appearance. The ambiguity introduced in uncalibrated
photometric stereo by matrices of this kind is called the generalized bas-relief ambiguity
(French, translates to ‘low-relief’, pronounced ‘bah ree-leef’).
(f, 5 points) Why low relief? Vary the parameters µ, ν and λ in the bas-relief transformation and visualize the corresponding surfaces. Include at least six (two with each parameter
varied) of the significant ones in your write-up. Looking at these, what is your guess for why
the bas-relief ambiguity is so named? In your write-up, describe how the three parameters
affect the surface.
(g, 5 points) Flattest surface possible. With the bas-relief ambiguity, in Eq. 2, how
would you go about designing a transformation that makes the estimated surface as flat as
possible?
(h, 5 points) More measurements. We solved the problem with 7 pictures of the face.
Will acquiring more pictures from more lighting directions help resolve the ambiguity?
Credits
This homework, in a large part, was inspired from the photometric stereo homework in
Computational Photography (15-463) offered by Prof. Ioannis Gkioulekas at CMU. The
data comes from the course on Computer Vision offered by Prof. Todd Zickler at Harvard.
References
[1] R. T. Frankot and R. Chellappa. A method for enforcing integrability in shape from
shading algorithms. IEEE Transactions on Pattern Analysis and Machine Intelligence,
10(4):439–451, July 1988.
[2] Alan Yuille and Daniel Snow. Shape and albedo from multiple images using integrability.
In Proceedings of IEEE Computer Society Conference on Computer Vision and Pattern
Recognition, pages 158–164. IEEE, 1997.
[3] P. N. Belhumeur, D. J. Kriegman, and A. L. Yuille. The bas-relief ambiguity. In Proceedings of IEEE Computer Society Conference on Computer Vision and Pattern Recognition,
pages 1060–1066, June 1997.