Description
1 Overview
In this homework you will be implementing a 3D dense SLAM system using pointbased fusion, which can be regarded as a simplified version of [1]. There are mainly
three steps to setup the system:
Localization: projective ICP, a common implementation of visual odometry
that is similar to [2].
Mapping: point-based fusion, a simplified map fusion algorithm.
SLAM: glue the components and build the entire system.
Generally SLAM is a tightly coupled system and hard to debug. We simplify the
problem by giving you the ground truth poses for testing. In the ICP step, you
will be working on two RGB-D frames (without a map). You may verify your
implementation with the loss and the visualization. In the map-based fusion step,
you will be working on a sequence of RGB-D frames with ground truth, and generate
a fused map. Finally, you may test your algorithm by putting everything together.
You will be testing the system on the synthethic ICL-NUIM [3] dataset. Unlike
real-world data, you don’t have to care about holes and noise, which can be tedious
to deal with. Please download the dataset from this link. Prior to running the
system, use python preprocess.py /path/to/dataset to generate normal maps.
2 Iterative Closest Point (ICP) (50 points)
Please read Section 5 in [1] before working on the following problems. Note, many
equations in [1] are using unconventional parameterizations, and the notations are
not easy to follow. We provide our derivation below as a reference.
In general, as its name indicates, ICP runs iteratively. Here is the general flow:
1
1. Setup data correspondences between source and target point clouds given the
current pose [Rt
| t
t
] (usually initialized with identity).
2. Linearize the loss function at [Rt
| t
t
] and solve for incremental update [δR | δt].
3. Update by left multiplying the incremental transformation [Rt+1 | t
t+1] = [δR |
δt][Rt
| t
t
].
4. Go back to step 1 until convergence.
You will be working on icp.py in this section.
2.1 Projective data association
Data association is critical for point cloud registration, since the overall spirit is
to push together correspondent points. In the conventional point cloud ICP, this
is achieved by nearest neighbor search, usually depending on a KDTree. Although
there are various fast implementations, building and searching in a KDTree can be
relatively slow.
In KinectFusion, a more efficient variation, projective data association is used.
Suppose the target point cloud is in the form of a vertex map (where each pixel
correpsonds to a point in 3D), then naturally there is an indexable geometry layout.
Now given a point p, instead of searching for q in the 3D space, after initial transformation, we can project p to the 2D image and obtain the pixel coordinate (u, v),
and directly read the corresponding point q from the vertex map. In this case, p
and q are not strictly nearest neighbors in 3D, but projective nearest neighbors. In
practice, this is very efficient and works pretty well.
In case you are unfamiliar with projective pinhole camera model, we have provided you with the project function.
Question (5 points): Suppose you have projected a point p to a vertex map
and obtained the u, v coordinate with a depth d. The vertex map’s height and
width are H and W. Write down the conditions u, v, d must satisfy to setup a
valid correspondence. Also implement the TODO : first filter section in function
find projective correspondence.
After obtaining the correspondences q from the vertex map and the corresponding normal nq in the normal map, you will need to additionally filter them by distance
thresholds and angle thresholds, so that |p − q| < dthr.
Question (5 points): Why is this step necessary? Implement this filter in
TODO : second filter section in function find projective correspondence.
2.2 Linearization
KinectFusion seeks to minimize the point-to-plane error between associated points
(p, q) where p is from the source and q is from the target. The error function can
be written by:
X
i∈Ω
r
2
i
(R, t) =
n
>
qi
(Rpi + t − qi)
2
, (1)
where (pi
, qi) is the i-th associated point pair in the assocation set Ω = {(pn, qn)},
and nqi
is the estimated normal at qi (we have covered the data association step,
i.e., how to associate pi and qi).
2
It is hard to directly solve R ∈ SO(3). So we rely on the small-angle assumption
by parameterizing δR with
δR =
1 −γ β
γ 1 −α
−β α 1
. (2)
and also introduce δt = [tx, ty, tz]. Note this parameterization is unlike [2] and is
closer to the conventional Lie Algebra’s formulation [4].
With δR and δt, we can write down the incremental update version
X
i∈Ω
r
2
i
(δR, δt) =
n
>
qi
(δR)p
0
i + δt − qi
2
, (3)
where p
0
i = R0pi + t
0 aiming at solving α, β, γ, tx, ty, tz with the given initial R0
, t0
.
Question (15 points): Now reorganize the parameters and rewrite ri(δR, δt) in
the form of
ri(α, β, γ, tx, ty, tz) = Ai
α
β
γ
tx
ty
tz
+ bi
,
where Ai
is a 1 × 6 matrix and bi
is a scalar.
Note: You don’t have to explicitly write down all the entries, and feel free to
introduce intermediate variables. A cross-product operator may be useful here:
[w]× =
0 −w2 w1
w2 0 −w0
−w1 w0 0
, w = [w0, w1, w2]
> ∈ R
3
.
2.3 Optimization
The aforementioned step in fact does the linearization. Now suppose we have collected n associated pairs, we move on to optimize
X
i=1n
r
2
i
(α, β, γ, tx, ty, tz) = Xn
i=1
Ai
α
β
γ
tx
ty
tz
+ bi
2
. (4)
Question (15 points): Write down the linear system that provides a closed form
solution of α, β, γ, tx, ty, tz in terms of Ai and bi
. You may either choose a QR
formulation by expanding a matrix and filling in rows (resulting in a n × 6 linear
system), or a LU formulation by summing up n matrices (resulting in a 6 × 6
system). Implement build linear system and the corresponding solve with your
derivation.
By solving the linear system you derived above, you can obtain the incremental
transformation update in the tangent space. This 6D vector can be converted to a
4 × 4 matrix by pose2transformation provided by us.
3
Figure 1: Point clouds before and after registration.
Now, run python icp.py /path/to/dataset. If everything works fine, you will
find ICP’s loss converges in less than 10 iterations and the inlier count increases
correspondingly. The two point clouds fit better, resulting in an interleaved pattern
shown in Fig. 1.
Question (10 points): Report your visualization before and after ICP with the
default source and target (frame 10 and 50). Then, choose another more challenging
source and target by yourself (e.g., frame 10 and 100) and report the visualization.
Analyize the reason for its failure or success.
Note: Press P to take a screenshot.
3 Point-based Fusion (40 points)
Please read Section 4 in [1] before working on the following problems.
KinectFusion uses volumetric TSDF as the map representation, which is nontrivial to implement. Point-based fusion, on the otherhand, maintains a weighted
point cloud and actively merges incoming points, hence is easy to implement.
The essential operations of point-based fusion are very similar to projective data
association. Given an estimated pose, we first project the point p in the map to the
image and obtain its corresponding (u, v) coordinates. Then, after proper filtering,
we compute a weighted average of the properties (positions, normals, colors).
You will be working on fusion.py in this section.
3.1 Filter
Similar to projective data association, we need a transformation. In a typical SLAM
system, this is obtained from accumulative ICP. In this section, however, we assume
ground truth transformations are known. The transformation T
w
c = [Rw
c
| t
w
c
] is from
the input camera frame’s coordinate system to the worlds’s coordinate system.
With the available transformation, you need to perform filtering similar to projective data association. The only difference is that you will need to add a normal
angle constraint to be more strict with filtering.
Question (5 points): Implement filter pass1, filter pass2 to obtain mask
arrays before merging and adding input points.
3.2 Merge
The merge operation updates existing points in the map by calculating a weight
average on the desired properties.
4
Figure 2: Fusion with ground truth poses with a normal map.
Question (15 points): Given p ∈ R
3
in the map coordinate system with a weight
w and its corresponding point q ∈ R
3
(read from the vertex map) in the frame’s
coordinate system with a weight 1, write down the weight average of the positions in
terms of p, q, Rw
c
, tw
c
, w. Similarly, write down the weight average of normals in terms
of np, nq, Rw
c
, w. Implement the corresponding part in merge. Note a normalization
of normals is required after weight average.
3.3 Addition
The projective data association may cover a big portion of the input vertex map,
but still there are uncovered regions. These points have to be added to the map.
Question (10 points): Implement the corresponding part in add. You will need
to select the unassociated points and concatenate the properties to the existing map.
3.4 Results
Now run python fusion.py /path/to/dataset. The system will take the ground
truth poses and produce the resulting image after fusing 200 frames, see Fig. 2.
Question (10 points): Report your visualization with a normal map, and the
final number of points in the map. Estimate the compression ratio by comparing the
number of points you obtain and the number of points if you naively concatenate
all the input. Note to speed up we use a downsample factor 2 for all the input.
Note: Press N to visualize the normal map.
4 The dense SLAM system (20 points + 10 points)
Now we put together both ICP and fusion code in main.py. We have called the
relevant functions implemented by you for your convenience.
Question (5 points): Which is the source and which is the target, for ICP
between the map and the input RGBD-frame? Can we swap their roles, why or why
not?
Run python main.py /path/to/dataset. By default, 200 frames will be tracked
and mapped.
Question (15 points): Report your visualization. Also, record the estimated
trajectory and compare against the ground truth. Drift is acceptable in the case, to
the extent shown in Fig. 3.
Bonus (10 points): Can you try to reduce the drift? There could be several
methods, from improving the RGBD odometry to improving the fusion by rejecting
5
Figure 3: Fusion with poses estimated from frame-to-model ICP.
outliers and eliminating dormant points (i.e., not associated to any points for a
certain period). Report your updated visualization and trajectories.
5 Code Submission Rules
Upload all of your python code in a folder. Include all your visualizations in your
PDF file. Do not upload the dataset when you submit.
References
[1] Keller, Maik, et al. “Real-time 3D reconstruction in dynamic scenes using
point-based fusion.” International Conference on 3D vision (3DV), 2013. Link:
http://ieeexplore.ieee.org/document/6599048/.
[2] Newcombe, Richard A., et al. “KinectFusion: Real-time dense surface mapping
and tracking.” 10th IEEE International Symposium on Mixed and Augmented
Reality (ISMAR), 2011. Link: http://ieeexplore.ieee.org/document/6162880/.
[3] Handa, Ankur, et al. “A benchmark for RGB-D visual odometry, 3D reconstruction and SLAM.” IEEE International Conference on Robotics and Automation (ICRA), 2014. Website:
https://www.doc.ic.ac.uk/˜ahanda/VaFRIC/iclnuim.html.
[4] Sola, Joan and Deray, Jeremie and Atchuthan, Dinesh. “A micro Lie theory for
state estimation in robotics,” arXiv 2018.

