Solved 16-820 Advanced Computer Vision: Homework 1 (Fall 2024) Augmented Reality with Planar Homographies

$30.00

Original Work ?

Download Details:

  • Name: hw1-xw64sd.zip
  • Type: zip
  • Size: 11.72 MB

Category: Tags: , , , You will Instantly receive a download link upon Payment||Click Original Work Button for Custom work

Description

Rate this product

In this assignment, you will be implementing an AR application step by step using planar homographies. Before we step into the implementation, we will walk you through the theory of planar homographies. In the programming section, you will first learn to find point correspondences between two images and use these to estimate the homography between them. Using this homography you will then warp images and finally implement your own AR applications. 1 Theory 1.1 Planar Homographies as a Warp Recall that a planar homography is an warp operation (which is a mapping from pixel coordinates from one camera frame to another) that makes a fundamental assumption of the points lying on a plane in the real world. Under this particular assumption, pixel coordinates in one view of the points on the plane can be directly mapped to pixel coordinates in another camera view of the same points. Figure 1: A homography H links all points x⇡ lying in plane ⇡ between two camera views x and x’ in cameras C and C0 respectively such that x’ = Hx. [From Hartley and Zisserman] Q1.1 (5 points): Prove that there exists a homography H that satisfies equation 1 given two 3 ⇥ 4 camera projection matrices P1 and P2 corresponding to the two cameras and a plane ⇧. You do not need to produce an actual algebraic expression for H. All we are asking for is a proof of the existence of H. x1 ⌘ Hx2 (1) The ⌘ symbol stands for identical to. The points x1 and x2 are in homogenous coordinates, which means they have an additional dimension. If xi is a 3D vector ⇥ xi yi zi ⇤T , it represents the 2D point [ xi zi yi zi ] (called inhomogenous coordinates). This additional dimension is a mathematical convenience to represent transformations (like translation, rotation, scaling, etc) in a concise matrix form. The ⌘ means that the equation is correct to a scaling factor. Note: A degenerate case happens when the plane ⇧ contains both cameras’ centers, in which case there are infinite choices of H satisfying equation 1. You can ignore this special case in your answer. Answer in your write-up. 1.2 The Direct Linear Transform A very common problem in projective geometry is often of the form x ⌘ Ay, where x and y are known vectors, and A is a matrix which contains unknowns to be solved. Given matching points in two images, our homography relationship clearly is an instance of such a problem. Note that the equality holds only up to scale (which means that the set of equations are of the form x = Hx0 ), which is why we cannot use an ordinary least squares solution such as what you may have used in the past to solve simultaneous equations. 2 A standard approach to solve these kinds of problems is called the Direct Linear Transform, where we rewrite the equation as proper homogeneous equations which are then solved in the standard least squares sense. Since this process involves disentangling the structure of the H matrix, it’s a transform of the problem into a set of linear equation, thus giving it its name. Q1.2 (15 points): Let x1 be a set of points in an image and x2 be the set of corresponding points in an image taken by another camera. Suppose there exists a homography H such that: xi 1 ⌘ Hxi 2 (i 2 {1 …N}) where xi 1 = [xi 1 yi 1 1] are in homogenous coordinates, xi 1 2 x1 and H is a 3 ⇥ 3 matrix. For each point pair, this relation can be rewritten as Aih = 0 where h is a column vector reshaped from H, and Ai is a matrix with elements derived from the points xi 1 and xi 2. This can help calculate H from the given point correspondences. 1. How many degrees of freedom does h have? (3 points) 2. How many point pairs are required to solve h? (2 points) 3. Derive Ai. (5 points) 4. When solving Ah = 0, in essence you’re trying to find the h that exists in the null space of A. What that means is that there would be some non-trivial solution for h such that that product Ah turns out to be 0. What will be a trivial solution for h? Is the matrix A full rank? Why/Why not? What impact will it have on the singular values (i.e. eigenvalues of A>A)? (5 points) 1.3 Using Matrix Decompositions to calculate the homography A homography H transforms one set of points (in homogenous coordinates) to another set of points. In this homework, we will obtain the corresponding point coordinates using feature matches and will then need to calculate the homography. You have already derived that Ax = 0 in Question 1. In this section, we will look at how to solve such equations using two approaches, either of which can be used in the subsequent assignment questions. 1.3.1 Eigenvalue Decomposition One way to solve Ax = 0 is to calculate the eigenvalues and eigenvectors of A. The eigenvector corresponding to 0 is the answer for this. Consider this example: A = 2 4 3 6 8 00 6 00 2 3 5 Using the numpy.linalg function eig(), we get the following eigenvalues and eigenvectors: V = 2 4 1.0000 0.8944 0.9535 0 0.4472 0.2860 0 00.0953 3 5 D = ⇥ 302⇤ Here, the columns of V are the eigenvectors and each corresponding element in D it’s eigenvalue. We notice that there is an eigenvalue of 0. The eigenvector corresponding to this is the solution for the equation Ax = 0. A = 2 4 3 6 8 00 6 00 2 3 5 2 4 0.8944 0.4472 0 3 5 = 2 4 0 0 0 3 5 3 1.3.2 Singular Value Decomposition The Singular Value Decomposition (SVD) of a matrix A is expressed as: A = U⌃VT Here, U is a matrix of column vectors called the left singular vectors. Similarly, V is called the right singular vectors. The matrix ⌃ is a diagonal matrix. Each diagonal element i is called the singular value and these are sorted in order of magnitude. In our case, it is a 9 ⇥ 9 matrix. • If 9 = 0, the system is exactly-determined, a homography exists and all points fit exactly. • If 9 0, the system is over-determined. A homography exists but not all points fit exactly (they fit in the least-squares error sense). This value represents the goodness of fit. • Usually, you will have at least four correspondences. If not, the system is under-determined. We will not deal with those here. The columns of U are eigenvectors of AAT . The columns of V are the eigenvectors of ATA. We can use this fact to solve for h in the equation Ah = 0. Using this knowledge, let us reformulate our problem of solving Ax = 0. We want to minimize the error in solution in the least-squares sense. Ideally, the product Ah should be 0. Thus, the sum-squared error can be written as: f(h) = 1 2 (Ah 0) T (Ah 0) = 1 2 (Ah) T (Ah) = 1 2 hT AT Ah Minimizing this error with respect to h, we get: d dhf = 0 =) 1 2 (AT A + (AT A) T )h = 0 AT Ah = 0 This implies that the value of h equals the eigenvector corresponding to the zero eigen-value (or closest to zero in case of noise). Thus, we choose the smallest eigenvalue of AT A, which is 9 in ⌃ and the least-squares solution to Ah = 0 is the the corresponding eigen-vector (in column 9 of the matrix V ). 1.4 Theory Questions Q1.4.1 (5 points): Homography under rotation Prove that there exists a homography H that satisfies x1 ⌘ Hx2, given two cameras separated by a pure rotation. That is, for camera 1, x1 = K1[I 0]X and for camera 2, x2 = K2[R 0]X. Note that K1 and K2 are the 3 ⇥ 3 intrinsic matrices of the two cameras and are di↵erent. I is 3 ⇥ 3 identity matrix, 0 is a 3 ⇥ 1 zero vector and X is a point in 3D space. R is the 3 ⇥ 3 rotation matrix of the camera. Q1.4.2 (5 points): Understanding homographies under rotation Suppose that a camera is rotating about its center C, keeping the intrinsic parameters K constant. Let H be the homography that maps the view from one camera orientation to the view at a second orientation. Let ✓ be the angle of rotation between the two. Show that H2 is the homography corresponding to a rotation of 2✓. Please limit your answer within a couple of lines. A lengthy proof indicates that you’re doing something too complicated (or wrong). Q1.4.3 (5 points): Limitations of the planar homography Why is the planar homography not completely sucient to map any arbitrary scene image to another viewpoint? State your answer concisely in 4 one or two sentences. Q1.4.4 (5 points): Behavior of lines under perspective projections We stated in class that perspective projection preserves lines (a line in 3D is projected to a line in 2D). Verify algebraically that this is the case, i.e., verify that the projection P in x = PX preserves lines. 2 Computing Planar Homographies 2.1 Feature Detection and Matching Before finding the homography between an image pair, we need to find corresponding point pairs between two images. But how do we get these points? One way is to select them manually, which is tedious and inecient. The CV way is to find interest points in the image pair and automatically match them. In the interest of being able to do cool stu↵, we will not reimplement a feature detector or descriptor here by yourself, but use python modules (both of them are provided in helper.py) The purpose of an interest point detector (e.g. Harris, SIFT, SURF, etc.) is to find particular salient points in the images around which we extract feature descriptors (e.g. MOPS, etc.). These descriptors try to summarize the content of the image around the feature points in as succint yet descriptive manner possible (there is often a trade-o↵ between representational and computational complexity for many computer vision tasks; you can have a very high dimensional feature descriptor that would ensure that you get good matches, but computing it could be prohibitively expensive). Matching, then, is a task of trying to find a descriptor in the list of descriptors obtained after computing them on a new image that best matches the current descriptor. This could be something as simple as the Euclidean distance between the two descriptors, or something more complicated, depending on how the descriptor is composed. For the purpose of this exercise, we shall use the widely used FAST detector in concert with the BRIEF descriptor. Q2.1.1 (5 points): FAST Detector How is the FAST detector di↵erent from the Harris corner detector that you’ve seen in the lectures? Can you comment on its computational performance compared to the Harris corner detector? Reference links: Original Paper, OpenCV Tutorial Q2.1.2 (5 points): BRIEF Descriptor How is the BRIEF descriptor di↵erent from the filterbanks you’ve seen in the lectures? Could you use any one of those filter banks as a descriptor? Reference links: Original Paper, OpenCV Tutorial Q2.1.3 (5 points): Matching Methods The BRIEF descriptor belongs to a category called binary descriptors. In such descriptors the image region corresponding to the detected feature point is represented as a binary string of 1s and 0s. A commonly used metric used for such descriptors is called the Hamming distance. Please search online to learn about Hamming distance and Nearest Neighbor, and describe how they can be used to match interest points with BRIEF descriptors. What benefits does the Hamming distance distance have over a more conventional Euclidean distance measure in our setting? Q2.1.4 (10 points): Feature Matching Figure 2: A few matched FAST feature points with the BRIEF descriptor. 5 Please implement a function in matchPics.py: matches, locs1, locs2 = matchPics(I1, I2, opts) where I1 and I2 are the images you want to match. opts stores two parameters. sigma is threshold for corner detection using FAST feature detector, and ratio is the ratio for BRIEF feature descriptor. locs1 and locs2 are N ⇥ 2 matrices containing the x and y coordinates of the matched point pairs. matches is a p ⇥ 2 matrix where the first column is indices into descriptor of features in I1 and similarly second column contains indices related to I2. Use the provided helper function corner detection() to compute the features, then build descriptors using computeBrief(), and finally compare them using briefMatch(). Use plotMatches() to visualize your matched points and include the result image in your write-up. An example is shown in Fig. 2. Run the provided starter code displayMatch.py to display matched features. There are two tunable parameters, both stored in the opts variable, and are loaded from opts.py. You can change the values by changing their default fields or by command-line arguments. For example, python displayMatch.py –sigma 0.15 –ratio 0.7. The number of matches between the 2 images varies based on the values of sigma in corner detection() and ratio in briefMatch(). You can vary these to get the best results. The example shown in Fig. 2 is with sigma= 0.15 and ratio= 0.7. We provide you with the following helper functions in helper.py: locs = corner detection(img, sigma) desc, locs = computeBrief(img, locs) matches = briefMatch(desc1, desc2, ratio) plotMatches(im1, im2, matches, locs1, locs2) locs is an N ⇥ 2 matrix in which each row represents the location (x, y) of a feature point. Please note that the number of valid output feature points could be less than the number of input feature points. desc is the corresponding matrix of BRIEF descriptors for the interest points. Please include the code snippet and your best output of displayMatch in your writeup. Q2.1.5 (10 points): Feature Matching and Parameter Tuning Conduct a small ablation study by running displayMatch.py with various sigma and ratio values. Include the figures displaying the matched features with various parameters in your writeup, and explain the e↵ect of these two parameters respectively. Q2.1.6 (10 points): BRIEF and Rotations Let’s investigate how BRIEF works with rotations. Write a script briefRotTest.py that: • Takes the cv cover.jpg and matches it to itself rotated from 0 to 360 degrees in increments of 10 degrees. [Hint: use scipy.ndimage.rotate] • Stores a histogram of the count of matches for each orientation. • Plots the histogram using matplotlib.pyplot.hist (x axis: rotation, y axis: number of matches). Visualize the histogram and the feature matching result at three suciently di↵erent orientations and include them in your write-up. Explain why you think the BRIEF descriptor behaves this way. Please include the code snippet in your writeup. Q2.1.7 (Extra Credit – 10 points): Improving Performance The extra credit opportunities described below are optional and provide an avenue to explore computer vision and improve the performance of the techniques developed above. 1. (5 pts) As we have seen, BRIEF is not rotation invariant. Design a simple fix to solve this problem using the tools you have developed so far (think back to edge detection and/or Harris corner’s covariance matrix). You are not allowed to use any additional OpenCV or Scikit-Image functions. Include the code in your PDF, and explain your design decisions and how you selected any parameters that you use. Demonstrate the e↵ectiveness of your algorithm on image pairs related by large rotation. 6 2. (5 pts) This implementation of BRIEF has some scale invariance, but there are limits. What happens when you match a picture to the same picture at half the size? Look to section 3 of [Lowe2004], for a technique that will make your detector more robust to changes in scale. Implement it and demonstrate it in action with several test images. Include your code and the test images in your PDF. You are not allowed to call any additional OpenCV or Scikit-Image functions. You may simply rescale some of the test images we have given you. 2.2 Homography Computation Q2.2.1 (15 points): Computing the Homography Write a function computeH in planarH.py that estimates the planar homography from a set of matched point pairs. H2to1 = computeH(x1, x2) x1 and x2 are N ⇥2 matrices containing the coordinates (x, y) of point pairs between the two images. H2to1 should be a 3 ⇥ 3 matrix for the best homography from image 2 to image 1 in the least-squares sense. The numpy.linalg function eig() or svd() will be useful to get the eigenvectors (see Section 1 of this handout for details). Please include the code snippet in your writeup. Q2.2.2 (10 points): Homography Normalization Normalization improves numerical stability of the solution and you should always normalize your coordinate data. Normalization has two steps: 1. Translate the mean of the points to the origin. 2. Scale the points so that the largest distance to the origin is p2 This is a linear transformation and can be written as follows: x˜1 = T1x1 x˜2 = T2x2 where x˜1 and x˜2 are the normalized homogeneous coordinates of x1 and x2. T1 and T2 are 3 ⇥ 3 matrices. The homography H from x˜2 to x˜1 computed by computeH satisfies x˜1 = Hx˜2 By substituting x˜1 and x˜2 with T1x1 and T2x2, we have: T1x1 = HT2x2 x1 = T1 1HT2x2 Implement the function computeH norm: H2to1 = computeH norm(x1, x2) This function should normalize the coordinates in x1 and x2 and call computeH(x1, x2) as described above. Please include the code snippet in your writeup. Q2.2.3 (25 points): Implement RANSAC The RANSAC algorithm can generally fit any model to noisy data. You will implement it for (planar) homographies between images. Remember that 4 point-pairs are required at a minimum to compute a homography. Write a function: 7 bestH2to1, inliers = computeH ransac(locs1, locs2, opts) where locs1 and locs2 are N ⇥ 2 matrices containing the matched points. opts stores two RANSAC parameters. max iters is the number of iterations to run RANSAC for, and inlier tol is the tolerance value for considering a point to be an inlier. bestH2to1 should be the homography H with most inliers found during RANSAC. H will be a homography such that if x2 is a point in locs2 and x1 is a corresponding point in locs1, then x1 ⌘ Hx2 . inliers is a vector of length N with a 1 at those matches that are part of the consensus set, and 0 elsewhere. Use computeH norm to compute the homography. Please include the code snippet in your writeup. Figure 3: An ordinary textbook (left). Harry-Potterized book (right) Q2.2.4 (10 points): Automated Homography Estimation and Warping Write a function warpImage() in HarryPotterize.py that 1. Reads cv cover.jpg, cv desk.png, and hp cover.jpg. 2. Computes a homography automatically using matchPics and computeH ransac. 3. Uses the computed homography to warp hp cover.jpg to the dimensions of the cv desk.png image using the OpenCV function cv2.warpPerspective function. 4. At this point you should notice that although the image is being warped to the correct location, it is not filling up the same space as the book. Why do you think this is happening? How would you modify hp cover.jpg to fix this issue? 5. Implement the function: composite img = compositeH(H2to1, template, img) to now compose this warped image with the desk image as in in Figure 3 6. Include your code and result in your write-up. Q2.2.5 (10 points): RANSAC Parameter Tuning Just like how we tune parameters for feature matching, there are two tunable parameters in RANSAC as well. You can change the values by changing their default fields or by command-line arguments. For example, python HarryPotterize.py –max iters 500 –inlier tol 2.0. Conduct a small ablation study by running HarryPotterize.py with various max iters and inlier tol values. Include the result images in your writeup, and explain the e↵ect of these two parameters respectively. 3 Creating your Augmented Reality application 3.1 Incorporating video (20 points) Now with the code you have, you’re able to create you own Augmented Reality application. What you’re going to do is HarryPoterize the video ar source.mov onto the video book.mov. More specifically, you’re going 8 to track the computer vision text book in each frame of book.mov, and overlay each frame of ar source.mov onto the book in book.mov. Please write a script ar.py to implement this AR application and save your result video as ar.avi in the result/ directory. You may use the function loadVid() that we provide to load the videos. Your result should be similar to the LifePrint project. You’ll be given full credits if you can put the video together correctly. See Figure 4 for an example frame of what the final video should look like. Hint for saving videos with OpenCV: Make sure the size of the images that you add to the VideoWriter match the size of the image used to instantiate the VideoWriter. Also, if you’re using a Mac, you may need to specify a di↵erent type of codec than MPV4. AVI usually works. Figure 4: Rendering video on a moving target Note that the book and the videos we have provided have very di↵erent aspect ratios (the ratio of the image width to the image height). You must crop each frame to fit onto the book cover. You must crop that image such that only the central region of the image is used in the final output. See Figure 5 for an example. The two videos have di↵erent lengths, you can use the shorter length for your video. Also, the video book.mov Figure 5: Rendering video on a moving target only has translation of objects. If you want to account for rotation of objects, scaling, etc, you would have to pick a better feature point representation (like ORB). Finally, note that this is a very time-intensive job and may take many hours on a single core (parallel processing on 8 cores takes around 30 minutes). Debug before running your full script (e.g. by saving a few early AR frames and verifying that they look correct). In your writeup, include three screenshots of your ar.avi at three distinct timestamps (e.g. when the overlay is near the center, left, and right of the video frame). See Figure 4 as an example of where the overlay is in the center of the video frame. Please also include the code snippet in your writeup 9 If the video is too large, please include a Google Drive link to your video in the writeup instead and ensure the shared link gives the TA’s viewing permission. 3.2 Make Your AR Real Time (Extra Credit – 15 points) Write a script ar ec.py that implements the AR program described in Q3.1 in real time. As an output of the script, you should process the videos frame by frame and have the combined frames played in real time. You don’t need to save the result video for this question. The extra credits will be given to fast programs measured by FPS (frames per second). More specifically, we give 5 points to programs that run faster than 10 FPS, 10 points to programs running faster than 20 FPS and 15 points to programs running faster than 30 FPS. Make sure to note the achieved fps in your write-up in addition to all the steps taken to achieve real-time performance. Please also include the code snippet in your writeup. 4 Create a Simple Panorama (10 points) Take two pictures with your own camera, separated by a pure rotation as best as possible, and then construct a panorama with panorama.py. Be sure that objects in the images are far enough away so that there are no parallax e↵ects. You can use python module cpselect to select matching points on each image or some automatic method. Submit the original images, the final panorama, and the script panorama.py that loads the images and assembles a panorama. We have provided two images for you to get started (data/pano left.jpg and data/pano right.jpg). Please use your own images when submitting this project. In your writeup, include your code, original images and the panorama result image. See Figure 6 below for example. Figure 6: Original Image 1 (top left). Original Image 2 (top right). Panorama (bottom). 10 5 HW1 Distribution Checklist After cloning the homework repo, you should have one folder for the data (data), one for your code (python) and one for extra credit questions (ec). In the python folder, where you will primarily work, you will find: • ar.py: script for section 3 • briefRotTest.py: script to test BRIEF with rotations • displayMatch.py: script to display matched features • HarryPoterize.py: script to automate homography and warp between source and target images • helper.py: some helper functions • matchPics.py: script to match features between pair of images • opts.py: some command line arguments • planarH.py: script to estimate planar homography 6 HW1 submission checklist Submit your write-up and code to Gradescope. • Writeup. The write-up should be a pdf file named hw1.pdf. • Code. The code should be submitted as a zip file named .zip. By extracting the zip file, it should have the following files in the structure defined below. (Note: Neglecting to follow the submission structure will incur a huge score penalty!) When you submit, remove the folder data/ if applicable, as well as any large temporary files that we did not ask you to create. – / # A directory inside .zip file ∗ python/ · <!– all of your .py files > ∗ ec/ · <!– all of your .py files > ∗ hw1.pdf make sure you upload this pdf file to Gradescope. Please assign the locations of answers to each question on Gradescope. 7 FAQs Credits: Cherie Ho 1. In matchPics, locs stands for pixel locations. locs1 and locs2 can have di↵erent sizes, since matches gives the mapping between the two for corresponding matches. We use skimage.feature.match descriptors (API) to calculate the correspondences. 2. Normalized homography – The function computeH norm should return the homography H between unnormalized coordinates and not the normalized homography H norm. As mentioned in the writeup, you can use the following steps as a reference: Hnorm = computeH(x1 normalized, x2 normalized) H = T1 1 @Hnorm@T2 11 3. The locs produced by matchPics are in the form of [row, col], which is (y,x) in coordinates. Therefore, you should first swap the columns returned by matchPics and then feed into Homography estimation. 4. Note that the third output np.linalg.svd is vh when computing homographies. 5. When debugging homographies, it is helpful to visualize the matches, and checking homographies with the same image. If there is not enough matches, try tuning the parameters. 6. If your images look blue-ish, your red and blue channels may be flipped. This is common when using cv2.imread. You can flip the last channel like so: hp cover = hp cover[:, :, [2, 1, 0]] or using another library (skimage.io.imread). 7. For Extra credit Q3.2, we’d like for you to speed up AR so that the processing is happening in realtime. This means we want each ”for loop” you run with the ar.py to run in less than 1/30 seconds. You should not need to use multiprocessing. Take a look at your Q3.1 timings. Which step/steps are taking the most time? Can you replace these with faster functions? You are allowed to use functions from other libraries. 8. A common bug is reversing the direction of homography. Make sure it’s what you expect! 8 Helpful Concepts Credits: Jack Good • Projection vs. Homography: A projection, usually written as P3⇥4, maps homogeneous 3D world coordinates to the view of a camera in homogeneous 2D coordinates. A planar homography, usually written as H3⇥3, under certain assumptions, maps the view of one camera in 2D homogeneous coordinates to the view of another camera in 2D homogeneous coordinates. • Deriving homography from projections: When deriving a homography from projections given assumptions about the world points or the camera orientations, make sure to include the intrinsic matrices of the two cameras, K1 and K2, which are 3 ⇥ 3 and invertible, but generally cannot be assumed to be identity or diagonal. Note the rule for inverting matrix products: (AB)1 = B1A1. When this rule is applied, even when both views are the same camera and K = K1 = K2, K is still part of H and does not cancel out. • Conditions for planar homography to exist: For a planar homography to exist between two views, we need either the points in the 3D world lie on a plane (as shown in 1.1 and applied in the Harry Potterize task), or there is pure rotation between the two views (as shown in 1.3 and applied in the Panorama task). We do not require both conditions to hold – only one or the other. • Definition of a line in 3D: While we can define a line in 2D as the points (x, y) satisfying ax+by+c = 0, or equivalently in homogeneous coordinates, this does not generalize to 3D. More specifically, (x, y, z) such that ax + by + cz + d = 0 defines a plane in 3D. Moreover, while a line is uniquely identified by two or more collinear points, that does not define the line in its entirety. The simplest way to do so is to specify a line as all points x 2 R3 such that x = x1 + (x2 x1), where x1 and x2 are two di↵erent points lying on the line, and 2 R. Several equivalent forms exist, and these definitions can be extended to homogeneous coordinates by appending a 1 value to each point.