Description
In this assignment, you will be implementing an algorithm to reconstruct a 3D point cloud from a pair of images taken at different angles. In Part I you will answer theory questions about 3D reconstruction. In Part II you will apply the 8-point algorithm and triangulation to find and visualize 3D locations of corresponding image points. Part I Theory Before implementing our own 3D reconstruction, let’s take a look at some simple theory questions that may arise. The answers to the below questions should be relatively short, consisting of a few lines of math and text (maybe a diagram if it helps your understanding). Q1.1 [5 points] Suppose two cameras fixate on a point x (see Figure 1) in space such that their principal axes intersect at that point. Show that if the image coordinates are normalized so that the coordinate origin (0, 0) coincides with the principal point, the F33 element of the fundamental matrix is zero. (0, 0) (0, 0) P C1 C2 Figure 1: Figure for Q1.1. C1 and C2 are the optical centers. The principal axes intersect at point w (P in the figure). Q1.1 2 Homework 3: 3D Reconstruction 16-820 Fall 2024 Q1.2 [5 points] Consider the case of two cameras viewing an object such that the second camera differs from the first by a pure translation that is parallel to the x-axis. Show that the epipolar lines in the two cameras are also parallel. Back up your argument with relevant equations. You may assume both cameras have the same intrinsics. Q1.2 Q1.3 [5 points] Suppose we have an inertial sensor that gives us the accurate positions (Ri and ti , the rotation matrix and translation vector) of the robot at time i. What will be the effective rotation (Rrel) and translation (trel) between two frames at different time stamps? Suppose the camera intrinsics (K) are known, express the essential matrix (E) and the fundamental matrix (F) in terms of K, Rrel and trel. Figure 2: Figure for Q1.3. C1 and C2 are the optical centers. The rotation and the translation is obtained using inertial sensors. Rrel and trel are the relative rotation and translation between two frames. 3 Homework 3: 3D Reconstruction 16-820 Fall 2024 Q1.3 4 Homework 3: 3D Reconstruction 16-820 Fall 2024 Part II Practice 1 Overview In this part you will begin by implementing the 8-point algorithm seen in class to estimate the fundamental matrix from corresponding points in two images (section 2). Next, given the fundamental matrix and calibrated intrinsics (which will be provided) you will compute the essential matrix and use this to compute a 3D metric reconstruction from 2D correspondences using triangulation (section 3). Then, you will implement a method to automatically match points taking advantage of epipolar constraints and make a 3D visualization of the results (section 4). Finally, you will implement RANSAC and bundle adjustment to further improve your algorithm (section 5). 2 Fundamental Matrix Estimation In this section you will explore different methods of estimating the fundamental matrix given a pair of images. In the data/ directory, you will find two images (see Figure 3) from the Middlebury multi-view dataset1 , which is used to evaluate the performance of modern 3D reconstruction algorithms. Figure 3: Temple images for this assignment 2.1 The Eight Point Algorithm The 8-point algorithm (discussed in class, and outlined in Section 8.1 of [1]) is arguably the simplest method for estimating the fundamental matrix. For this section, you can use provided correspondences you can find in data/some corresp.npz. Q2.1 [10 points] Finish the function eightpoint in q2 1 eightpoint.py. Make sure you follow the signature for this portion of the assignment: F = eightpoint(pts1, pts2, M) where pts1 and pts2 are N × 2 matrices corresponding to the (x, y) coordinates of the N points in the first and second image respectively. M is a scale parameter. 1http://vision.middlebury.edu/mview/data/ 5 Homework 3: 3D Reconstruction 16-820 Fall 2024 Figure 4: displayEpipolarF in helper.py creates a GUI for visualizing epipolar lines • You should scale the data as was discussed in class, by dividing each coordinate by M (the maximum of the image’s width and height). After computing F, you will have to “unscale” the fundamental matrix. Hint: If xnormalized = Tx, then Funnormalized = TT FT. You must enforce the singularity condition of F before unscaling. • You may find it helpful to refine the solution by using local minimization. This probably won’t fix a completely broken solution, but may make a good solution better by locally minimizing a geometric cost function. For this we have provided a helper function refineF in helper.py taking in F and the two sets of points, which you can call from eightpoint before unscaling F. • Remember that the x-coordinate of a point in the image is its column entry, and y-coordinate is the row entry. Also note that eight-point is just a figurative name, it just means that you need at least 8 points; your algorithm should use an over-determined system (N > 8 points). • To visualize the correctness of your estimated F, use the function displayEpipolarF in helper.py, which takes in F, and the two images. This GUI lets you select a point in one of the images and visualize the corresponding epipolar line in the other image (Figure 4). • In addition to visualization, we also provide a test code snippet in q2 1 eightpoint.py which uses helper function calc epi error to evaluate the quality of the estimated fundamental matrix. This function calculates the distance between the estimated epipolar line and the corresponding points. For the eight point algorithm, the error should on average be < 1. Output: Save your matrix F and scale M to the file q2 1.npz. In your write-up: • Write your recovered F • Include an image of some example output of displayEpipolarF • Include the code snippet of eightpoint function 6 Homework 3: 3D Reconstruction 16-820 Fall 2024 Q2.1 2.2 The Seven Point Algorithm (Extra Credit) Since the fundamental matrix only has seven degrees of freedom, it is possible to calculate F using only seven-point correspondences. This requires solving a polynomial equation. In this section, you will implement the seven-point algorithm (outlined in this post). Q2.2 [Extra Credit – 15 points] Finish the function sevenpoint in q2 2 sevenpoint.py. Make sure you follow the signature for this portion of the assignment: Farray = sevenpoint(pts1, pts2, M) where pts1 and pts2 are 7 × 2 matrices containing the correspondences and M is the normalizer (use the maximum of the image’s height and width), and Farray is a list array of length either 1 or 3 containing Fundamental matrix/matrices. Use M to normalize the point values between [0, 1] and remember to “unnormalize” your computed F afterward. Manually select 7 points from the provided point in data/some corresp.npz, and use these points to recover a fundamental matrix F. Use calc epi error in helper.py to calculate the error to pick the best one, and use displayEpipolarF to visualize and verify the solution. Output: Save your matrix F and scale M to the file q2 2.npz. In your write-up: • Write your recovered F • Include an image of some example output of displayEpipolarF • Include the code snippet of sevenpoint function 7 Homework 3: 3D Reconstruction 16-820 Fall 2024 Q2.2 8 Homework 3: 3D Reconstruction 16-820 Fall 2024 3 Metric Reconstruction You will compute the camera matrices and triangulate the 2D points to obtain the 3D scene structure. To obtain the Euclidean scene structure, first convert the fundamental matrix F to an essential matrix E. Examine the lecture notes and the textbook to find out how to do this when the internal camera calibration matrices K1 and K2 are known; these are provided in data/intrinsics.npz. Q3.1 [5 points] Complete the function essentialMatrix in q3 1 essential matrix.py to compute the essential matrix E given F, K1 and K2 with the signature: E = essentialMatrix(F, K1, K2) Output: Save your estimated E using F from the eight-point algorithm to q3 1.npz. In your write-up: • Write your estimated E • Include the code snippet of essentialMatrix function Q3.1 Given an essential matrix, it is possible to retrieve the projective camera matrices M1 and M2 from it. Assuming M1 is fixed at [I, 0], M2 can be retrieved up to a scale and four-fold rotation ambiguity. For details on recovering M2, see section 11.3 in Szeliski. We have provided you with the function camera2 in python/helper.py to recover the four possible M2 matrices given E. Note: The matrices M1 and M2 here are of the form: M1 = I|0 and M2 = R|t . Q3.2 [10 points] Using the above, complete the function triangulate in q3 2 triangulate.py to triangulate a set of 2D coordinates in the image to a set of 3D points with the signature: [w, err] = triangulate(C1, pts1, C2, pts2) where pts1 and pts2 are the N × 2 matrices with the 2D image coordinates and w is an N × 3 matrix with the corresponding 3D points per row. C1 and C2 are the 3 × 4 camera matrices. Remember that you will need to multiply the given intrinsics matrices with your solution for the canonical camera matrices to obtain the final camera matrices. Various methods exist for triangulation – probably the most familiar for you is based on least squares (see [2] Chapter 7 if you want to learn about other methods). 9 Homework 3: 3D Reconstruction 16-820 Fall 2024 For each point i, we want to solve for 3D coordinates wi = xi , yi , zi T , such that when they are projected back to the two images, they are close to the original 2D points. To project the 3D coordinates back to 2D images, we first write wi in homogeneous coordinates, and compute C1w˜i and C2w˜i to obtain the 2D homogeneous coordinates projected to camera 1 and camera 2, respectively. For each point i, we can write this problem in the following form: Aiwi = 0, where Ai is a 4 × 4 matrix, and w˜i is a 4 × 1 vector of the 3D coordinates in the homogeneous form. Then, you can obtain the homogeneous least-squares solution (discussed in class) to solve for each wi . Once you have implemented triangulation, check the performance by looking at the reprojection error: err = X i ∥x1i , xc1i∥ 2 + ∥x2i , xc2i∥ 2 where xc1i = P roj(C1, wi) and xc2i = P roj(C2, wi). You should see an error less than 500. Ours is around 350. Note: C1 and C2 here are projection matrices of the form: C1 = K1M1 = K1 I|0 and C2 = K2M2 = K2 R|t . In your write-up: • Write down the expression for the matrix Ai for triangulating a pair of 2D coordinates in the image to a 3D point. • Include the code snippet of triangulate function. Q3.2 Q3.3 [10 points] Complete the function findM2 in q3 2 triangulate.py to obtain the correct M2 from M2s by testing the four solutions through triangulations. Use the correspondences from data/some corresp.npz. Output: Save the correct M2, the corresponding C2, and 3D points P to q3 3.npz. In your writeup: Include the code snippet of findM2 function. 10 Homework 3: 3D Reconstruction 16-820 Fall 2024 Q3.3 11 Homework 3: 3D Reconstruction 16-820 Fall 2024 4 3D Visualization You will now create a 3D visualization of the temple images. By treating our two images as a stereo-pair, we can triangulate corresponding points in each image, and render their 3D locations. Q4.1 [15 points] In q4 1 epipolar correspondence.py finish the function epipolarCorrespondence with the signature: [x2, y2] = epipolarCorrespondence(im1, im2, F, x1, y1) This function takes in the x and y coordinates of a pixel on im1 and your fundamental matrix F, and returns the coordinates of the pixel on im2 which correspond to the input point. The match is obtained by computing the similarity of a small window around the (x1, y1) coordinates in im1 to various windows around possible matches in the im2 and returning the closest. Instead of searching for the matching point at every possible location in im2, we can use F and simply search over the set of pixels that lie along the epipolar line (recall that the epipolar line passes through a single point in im2 which corresponds to the point (x1, y1) in im1). There are various possible ways to compute the window similarity. For this assignment, simple methods such as the Euclidean or Manhattan distances between the intensity of the pixels should suffice. See [2] chapter 11, on stereo matching, for a brief overview of these and other methods. Implementation hints: • Experiment with various window sizes. • It may help to use a Gaussian weighting of the window, so that the center has greater influence than the periphery. • Since the two images only differ by a small amount, it might be beneficial to consider matches for which the distance from (x1, y1) to (x2, y2) is small. To help you test your epipolarCorrespondence, we have included a helper function epipolarMatchGUI in q4 1 epipolar correspondence.py, which takes in two images and the fundamental matrix. This GUI allows you to click on a point in im1, and will use your function to display the corresponding point in im2. See Figure 5. Figure 5: epipolarMatchGUI shows the corresponding point found by calling epipolarCorrespondence 12 Homework 3: 3D Reconstruction 16-820 Fall 2024 It’s not necessary for your matcher to get every possible point right, but it should get easy points (such as those with distinctive, corner-like windows). It should also be good enough to render an intelligible representation in the next question. Output: Save the matrix F, points pts1 and pts2 which you used to generate the screenshot to the file q4 1.npz. In your write-up: • Include a screenshot of epipolarMatchGUI with some detected correspondences. • Include the code snippet of epipolarCorrespondence function. Q4.1 Q4.2 [10 points] Included in this homework is a file data/templeCoords.npz which contains 288 hand-selected points from im1 saved in the variables x1 and y1. Now, we can determine the 3D location of these point correspondences using the triangulate function. These 3D point locations can then be plotted using the Matplotlib or plotly package. Complete the compute3D pts function in q4 2 visualize.py, which loads the necessary files from ../data/ to generate the 3D reconstruction using scatter function matplotlib. An example is shown in Figure 6. Output: Again, save the matrix F, matrices M1,M2, C1, C2 which you used to generate the screenshots to the file q4 2.npz. In your write-up: • Take a few screenshots of the 3D visualization so that the outline of the temple is clearly visible, and include them in your writeup. • Include the code snippet of compute3D pts function in your write-up. 13 Homework 3: 3D Reconstruction 16-820 Fall 2024 Figure 6: An example point cloud Q4.2 14 Homework 3: 3D Reconstruction 16-820 Fall 2024 5 Bundle Adjustment Bundle Adjustment is commonly used as the last step of every feature-based 3D reconstruction algorithm. Given a set of images depicting a number of 3D points from different viewpoints, bundle adjustment is the process of simultaneously refining the 3D coordinates along with the camera parameters. It minimizes reprojection error, which is the squared sum of distances between image points and predicted points. In this section, you will implement bundle adjustment algorithm by yourself (make use of q5 bundle adjustment.py file). Specifically, • In Q5.1, you need to implement a RANSAC algorithm to estimate the fundamental matrix F and all the inliers. • In Q5.2, you will need to write code to parameterize Rotation matrix R using Rodrigues formula (please check this pdf for a detailed explanation), which will enable the joint optimization process for Bundle Adjustment. • In Q5.3, you will need to first write down the objective function in rodriguesResidual, and do the bundleAdjustment. Q5.1 RANSAC for Fundamental Matrix Recovery [15 points] In some real world applications, manually determining correspondences is infeasible and often there will be noisy correspondences. Fortunately, the RANSAC method seen in class can be applied to the problem of fundamental matrix estimation. Implement the above algorithm with the signature: [F, inliers] = ransacF(pts1, pts2, M, nIters, tol) where M is defined in the same way as in section 2 and inliers is a boolean vector of size equivalent to the number of points. Here inliers is set to true only for the points that satisfy the threshold defined for the given fundamental matrix F. We have provided some noisy correspondences in some corresp noisy.npz in which around 75% of the points are inliers. In your write-up: Compare the result of RANSAC with the result of the eightpoint when ran on the noisy correspondences. Briefly explain the error metrics you used, how you decided which points were inliers, and any other optimizations you may have made. nIters is the maximum number of iterations of RANSAC and tol is the tolerance of the error to be considered as inliers. Discuss the effect on the Fundamental matrix by varying these values. Please include the code snippet of the ransacF function in your write-up. • Hints: Use the Eight or Seven point algorithm to compute the fundamental matrix from the minimal set of points. Then compute the inliers, and refine your estimate using all the inliers. 15 Homework 3: 3D Reconstruction 16-820 Fall 2024 Q5.1 Q5.2 Rodrigues and Invsere Rodrigues [15 points] So far we have independently solved for camera matrix, Mj and 3D points wi . In bundle adjustment, we will jointly optimize the reprojection error with respect to the points wi and the camera matrix Cj . err = X ij ∥xij − P roj(Cj , wi)∥ 2 , where Cj = KjMj , same as in Q3.2. For this homework we are going to only look at optimizing the extrinsic matrix. To do this we will be parameterizing the rotation matrix R using Rodrigues formula to produce vector r ∈ R 3 . Write a function that converts a Rodrigues vector r to a rotation matrix R R = rodrigues(r) as well as the inverse function that converts a rotation matrix R to a Rodrigues vector r r = invRodrigues(R) Reference: Rodrigues formula and this pdf. In your write-up: Include the code snippet of rodrigues and invRodrigues functions. Q5.2 Q5.3 Bundle Adjustment [10 points] Using this parameterization, write an optimization function 16 Homework 3: 3D Reconstruction 16-820 Fall 2024 residuals = rodriguesResidual(K1, M1, p1, K2, p2, x) where x is the flattened concatenation of x, r2, and t2. w are the 3D points; r2 and t2 are the rotation (in the Rodrigues vector form) and translation vectors associated with the projection matrix M2. The residuals are the difference between original image projections and estimated projections (the square of L2-norm of this vector corresponds to the error we computed in Q3.2): residuals = numpy.concatenate([(p1-p1 hat).reshape([-1]), (p2-p2 hat).reshape([-1])]) Use this error function and Scipy’s optimizer minimize to write a function that optimizes for the best extrinsic matrix and 3D points using the inlier correspondences from some corresp noisy.npz and the RANSAC estimate of the extrinsics and 3D points as an initialization. [M2, w, o1, o2] = bundleAdjustment(K1, M1, p1, K2, M2 init, p2, w init) Try to extract the rotation and translation from M2 init, then use invRodrigues you implemented previously to transform the rotation, concatenate it with translation and the 3D points, then the concatenate vector are variables to be optimized. After obtaining optimized vector, decompose it back to rotation using rodrigues you implemented previously, translation and 3D points coordinates. In your write-up: • Include an image of the original 3D points and the optimized points (use the provided plot 3D dual function). • Report the reprojection error with your initial M2 and w, as well as with the optimized matrices. • Include the code snippets for rodriguesResidual and bundleAdjustment in your write-up. Hint: For reference, our solution achieves a reprojection error around 10 after optimization. Your exact error may differ slightly. Q5.3 17 Homework 3: 3D Reconstruction 16-820 Fall 2024 Figure 7: Visualization of 3D points for noisy correspondences before and after using bundle adjustment 6 Multiview Keypoint Reconstruction (Extra Credit) You will use multi-view capture of moving vehicles and reconstruct the motion of a car. The first part of the problem will be using a single time instance capture from three views (Figure 8 Top) and reconstruct vehicle keypoints and render from multiple views (Figure 8 Bottom). Make use of q6 ec multiview reconstruction.py file and data/q6 folder contains the images. Q6.1 [Extra Credit – 15 points] Write a function to compute the 3D keypoint locations P given the 2D part detections pts1, pts2 and pts3 and the camera projection matrices C1, C2, C3. The camera matrices are given in the numpy files. [P, err] = MultiviewReconstruction(C1, pts1, C2, pts2, C3, pts3, Thres) The 2D part detections (pts) are computed using a neural network2 and correspond to different locations on a car like the wheels, headlights etc. The third column in pts is the confidence of localization of the keypoints. Higher confidence value represents more accurate localization of the keypoint in 2D. To visualize the 2D detections run visualize keypoints(image, pts, Thres) helper function. Thres is defined as the confidence threshold of the 2D detected keypoints. The camera matrices (C) are computed by running an SFM from multiple views and are given in the numpy files with the 2D locations. By varying confidence threshold Thres (.i.e. considering only the points above the threshold), we get different reconstruction and accuracy. Try varying the thresholds and analyze its effects on the accuracy of the reconstruction. Save the best reconstruction (the 3D locations of the parts) from these parameters into a q6 1.npz file. Hint: You can modify the triangulation function to take three views as input. After you do the threshold lets say m points lie above the threshold and n points lie below the threshold. Now your task is to use these m good points to compute the reconstruction. For each 3D location use two view or three view triangulation for intialization based on visibility after thresholding. In your write-up: • Describe the method you used to compute the 3D locations. 2Code Used For Detection and Reconstruction 18 Homework 3: 3D Reconstruction 16-820 Fall 2024 Figure 8: An example detections on the top and the reconstructions from multiple views • Include an image of the Reconstructed 3D points with the points connected using the helper function plot 3d keypoint(P) with the reprojection error. • Include the code snippets MultiviewReconstruction in your write-up. Q6.1 Q6.2 [Extra Credit – 15 points] From the previous question you have done a 3D reconstruction at a time instance. Now you are going to iteratively repeat the process over time and compute a spatio temporal reconstruction of the car. The images in the data/q6 folder shows the motion of the car at an intersection captured from multiple views. The images are given as (cam1 time0.jpg, …, cam1 time9.jpg) for camera 1 and (cam2 time0.jpg, …, cam2 time9.jpg) for camera2 and (cam3 time0.jpg, …, cam3 time9.jpg) for camera3. The corresponding detections and camera matrices are given in (time0.npz, …, time9.npz). Use the above details and compute the spatio temporal reconstruction of the car for all 10 time instances and plot them by completing the plot 3d keypoint video function. A sample plot with the first and last time instance reconstuction of the car with the reprojections shown in the Figure 9. 19 Homework 3: 3D Reconstruction 16-820 Fall 2024 Figure 9: Spatiotemporal reconstruction of the car (right) with the projections at two different time instances in a single view(left) In your write-up: • Plot the spatio-temporal reconstruction of the car for the 10 timesteps. • Include the code snippets plot 3d keypoint video in your write-up. Q6.2 7 Deliverables The assignment (code and write-up) should be submitted to Gradescope. The write-up should be named hw3.pdf and the code should be a zip named hw3.zip. Please make sure that you assign the location of answers to each question on Gradescope. The zip should have the following files in the structure defined below. (Note: Neglecting to follow the submission structure will incur a huge score penalty!). You can run the included checkA4Submission.py script to ensure that your zip folder structure is correct. • hw3.pdf: your write-up. 20 Homework 3: 3D Reconstruction 16-820 Fall 2024 • q2 1 eightpoint.py: script for Q2.1. • q2 2 sevenpoint.py: script for Q2.2. • q3 1 essential matrix.py: script for Q3.1. • q3 2 triangulate.py: script for Q3.2. • q4 1 epipolar correspondence.py: script for Q4.1. • q4 2 visualize.py: script for Q4.2. • q5 bundle adjustment.py: script for Q5. • q6 ec multiview reconstruction.py: script for (extra-credit) Q6. • helper.py: helper functions. • q2 1.npz: file with output of Q2.1. • q2 2.npz: file with output of Q2.2. • q3 1.npz: file with output of Q3.1. • q3 3.npz: file with output of Q3.3. • q4 1.npz: file with output of Q4.1. • q4 2.npz: file with output of Q4.2. • q6 1.npz: (extra-credit) file with the output of Q6.1. *Do not include the data directory in your submission. 8 FAQs Credits: Paul Nadan Q2.1: Does it matter if we unscale F before or after calling refineF? The relationship between F and Fnormalized is fixed and defined by a set of transformations, so we can convert at any stage before or after refinement. The nonlinear optimization in refineF may work slightly better with normalized F, but it should be fine either way. Q2.1: Why does the other image disappear (or become really small) when I select a point using the displayEpipolarF GUI? This issue occurs when the corresponding epipolar line to the point you selected lies far away from the image. Something is likely wrong with your fundamental matrix. Q2.1 Note: The GUI will provide the correct epipolar lines even if the program is using the wrong order of pts1 and pts2 in calculating the eightpoint algorithm. So one thing to check is that the optimizer should only take < 10 iterations (shown in the output) to converge if the ordering is correct. Q3.2: How can I get started formulating the triangulation equations? One possible method: from the first camera, x1i = P1ω1 =⇒ x1i × P1ω1 = 0 =⇒ A1iωi = 0. This is a linear system of 3 equations, one of which is redundant (a linear combination of the other two), and 4 21 Homework 3: 3D Reconstruction 16-820 Fall 2024 variables. We get a similar equation from the second camera, for a total of 4 (non-redundant) equations and 4 variables, i.e. Aiωi = 0. Q3.2: What is the expected value of the reprojection error? The reprojection error for the data in some corresp.npz should be around 352 (or 89 without using refineF). If you get a reprojection error of around 94 (or 1927 without using refineF) then you have somehow ended up with a transposed F matrix in your eightpoint function. Q3.2: If you are getting high reprojection error but can’t find any errors in your triangulate function? one useful trick is to temporarily comment out the call to refineF in your 8-point algorithm and make sure that the epipolar lines still match up. The refineF function can sometimes find a pretty good solution even starting from a totally incorrect matrix, which results in the F matrix passing the sanity checks even if there’s an error in the 8-point function. However, having a slightly incorrect F matrix can still cause the reprojection error to be really high later on even if your triangulate code is correct. Q4.2 Note: Figure 7 in the assignment document is incorrect – if you look closely you’ll notice that the z coordinates are all negative. Don’t worry if your solution is different from the example as long as the 3D structure of the temple is evident. Q5.1: How many inliers should I be getting from RANSAC? The correct number of inliers should be around 106. This provides a good sanity check for whether the chosen tolerance value is appropriate. References [1] David A Forsyth and Jean Ponce. Computer vision: a modern approach. prentice hall professional technical reference, 2002. [2] Richard Szeliski. Computer vision: algorithms and applications. Springer Nature, 2022.


