Handling Uncertainty in
Estimation Problems with Lie Groups
Motivation
2
Optimization-based estimation algorithm
Measurements/ models with uncertainties
Estimated variables
with uncertainties
How to ensure everything is consistent?
Contents
3
Part 1: Factor graphs
Lightspeed review
5
X1
X2
X3
p1
p2
Xn
Xn+1
X3
A factor graph factorizes a function
Dellaert & Kaess (2017), “Factor Graphs for Robot Perception”
Lightspeed review
6
X1
X2
X3
p1
p2
Xn
Xn+1
X3
Variables (unknown)
Lightspeed review
7
p1
p2
Factors
(known)
X1
X2
X3
Xn
Xn+1
X3
Factors indicate how the variables relate each other
Factor graphs can be used to describe many problems
8
Dellaert (2020), “Factor graphs: Exploiting structure in robotics”, Annual Reviews
DRS examples
9
KINS
VTR
TOFG
Xi
X1
X2
X3
X4
Work in progress
Under review
Work in progress
Factorized function
10
X1
X2
X3
p1
p2
Xn
Xn+1
X3
A factor graph factorizes a function
Factors as distributions
11
In general, the factors are given by probability distributions
So the factor graph represents a factorization of the joint distribution of many variables
Finding the unknowns
12
In the factor graph, the variables represent parameters of the distribution that must be found
We assume the parameters are single quantities (i.e, not distributions themselves)
Any parameter estimation method can be used
Finding the unknowns with MAP
13
The most common solution is maximum a posteriori (MAP)
Finding the unknowns with MAP
14
The most common solution is maximum a posteriori (MAP)
Since maximizing a product is difficult, we generally minimize the negative log-likelihood instead
*please note the negative sign convert the maximization into a minimization
Now it comes the assumptions
15
To make things easier, we assume the distributions are Gaussian
However, we don’t say that Xi is Gaussian itself.
Now it comes the assumptions
16
We say the error or residual between a function of Xi and some prior knowledge zi (e.g. measurements) is Gaussian
(we’ll save this trick for later)
Distributes as Gaussian
Sigma corresponds to the sensor or model covariance
Now it comes the assumptions
17
So, the probability of each factor is given by
Least squares minimization
18
This converts the problem into a (nonlinear) least squares minimization
Gaussian assumption
(Here we ignore some constant terms)
Inspecting the Least Squares (LS) problem
19
This term is usually ignored since Sigma is constant*
*We can keep the term and also optimize it (but it would require other solvers/strategies, such as expectation-maximization). In this case, we would be optimizing the sensor models (“learning the covariances”).
Inspecting the Least Squares (LS) problem
20
These sigmas (measurement/model covariances) are the only uncertainties we plug into the system
Error or Residual of the factor
Inspecting the Least Squares (LS) problem
21
The solution only returns a value but not an uncertainty estimate
(LS is a point estimator)
Solving the Least Squares (LS) problem
22
LS can be solved in closed-form if h(X) is linear
Solving the Least Squares (LS) problem
23
By setting the derivative to zero,
we find the normal equations
Solving the Least Squares (LS) problem
24
By setting the derivative to zero,
we find the normal equations
Information matrix,
Fisher Information,
or Hessian
Analyzing the Information Matrix
25
Obs. 1: Matrix A cannot have zero columns since they can indeterminate the linear system
Analyzing the Information Matrix
26
Obs. 2: The new matrix is denser than
so it’s expensive to invert to solve the linear system
Factoring the matrix and reordering the rows and columns is the usual trick to solve by back-substitution without inverting
iSAM,
Analyzing the Information Matrix
27
Obs. 3: The matrix approximates* the information (inverse covariance) of the solution of the LS problem
However, we need to invert the matrix to obtain the covariance of the solution
Marginal information matrices can be obtained, but still require inversion to get covariances
*It approximates the covariance of the solution because we’re not explicitly optimizing the covariance of the solution - this is called a “Laplace approximation”. To optimize the covariance as well we can use variational inference - Barfoot has some recent papers on it.
Nonlinear factor graphs
28
If h(X) is a nonlinear function, we must use a nonlinear optimization algorithm
Nonlinear factor graphs
29
Algorithms such as Gauss-Newton or Levenberg-Marquardt will linearize the system around the initial solution (linearization point)
Jacobian of h(X)
Linearization point
Solving nonlinear factor graphs
30
Applying the linearization we obtain a linear system as before
Normal equations
31
So we can solve it with the same normal equations (with Jacobians instead of the A matrix)
Jacobians must be well-defined (i.e, no zero columns)
Levenberg-Marquardt can solve ill-conditions
(but cannot be used in incremental problems with iSAM)
Updating the solution
32
The solution of the linear system is a small correction to the current linearization point to improve the solution
We relinearize in the new linearization point and repeat until convergence
All the covariances that can be extracted from the information matrix are valid for the linearization point only
Summary
33
Covariance of the solution. Obtained from the information matrix of the linear system
Summary
34
Covariance of the solution. Obtained from the information matrix of the linear system
Factors (measurements and covariances)
are the inputs
Variables
(mean and covariances)
are the outputs
Part 2: Lie groups
Dealing with rotations and transformations
36
When designing a factor graph, we need factors with residual functions that output vector quantities.
Some factors can have rotations or rigid body transformations involved.
3D Rotations
37
Rotations can be represented in many ways, but have 3 inherent DoF
Euler angles (ɑ,β,𝛄)
3x1 vector
Axis-angle (v,θ)
3x1 vector
Quaternions q=(w,x,y,z)
Unit 4x1 vector
Rotation matrices R
3x3 orthogonal matrix
How to measure the difference of rotations? What about poses?
Lie Groups
38
Lie Groups offer a principled way to solve these issues with a common framework
Rotation matrices, quaternions and rigid-body transformations are Lie groups, so the same principles can be applied with all of them.
Special Orthogonal Group
(Rotations)
Special Euclidean Group
(Rigid-body transformations)
Lie Groups, roughly
39
A smooth manifold
A group
A differentiable surface that “looks Euclidean” at any point
A set with a composition operation and basic properties
Closure
Identity
Inverse
Associativity
Solà, Deray, Atchuthan (2018), “A micro Lie theory for state estimation in robotics”, arXiv
Lie Groups
40
The main advantage of Lie Groups is that any element on the manifold can be mapped into a Euclidean vector space (Lie algebra)
So we can use the same tools from vector spaces, plus some extra rules
Lie Group
Lie algebra
Lie Groups - with figures
41
Lie Group
Vector space
“Lie algebra”*
*Note: This is not exactly the Lie algebra, but a mapping can be established through the hat and vee operators
Lie Groups - with figures
42
Lie Group
Vector space
“Lie algebra”
Logarithm map
Lie Groups - with figures
43
Lie Group
Vector space
“Lie algebra”
Logarithm map
Exponential map
Lie Groups - with figures
44
Group defined at the identity
Lie Groups - with figures
45
Group defined at T
Lie Groups - with figures
46
Right-hand convention
Used in GTSAM - Dellaert, Carlone, Scaramuzza, Solà
and DRS
Lie Groups - with figures
47
Left-hand convention
Used by Barfoot, Mangelson, and other authors
Lie Groups - Adjoint operator
48
Left-hand convention
Right-hand convention
Lie Groups - Adjoint operator
49
Left-hand convention
Right-hand convention
Importance of the convention in optimization problems
The Exponential and Logarithm map allow us to map quantities between the Lie group and Lie algebra
We can define residuals for the factor graph using the Logarithm map
Note: The definition of the logarithm map determines the order of the variables
In GTSAM:
50
Importance of the convention in optimization problems
The Logarithm map also determines the ordering of the covariance matrices
(input matrices in the factors, and output matrices in the information matrix)
51
Obtained from the information matrix of the linear system
Importance of the convention in optimization problems
The optimization loop also becomes affected.
The linearized linear system is solved in the tangent space, using vectors
However, instead of using the update rule:
We map the correction from the tangent space back on the manifold
(Right hand convention also important here)
52
Graphically
53
1. Linearization of all the factors evaluated at
- “Lifting”
2. Computation of optimization correction using linear system (normal equations)
3.Update the variables back on the manifold - “Retracting”
Forster et al (2016), “On-Manifold Preintegration for Real-Time Visual-Inertial Odometry”, T-RO
Part 3: Bringing everything together
A pose graph SLAM problem
Let’s return to modelling estimation problems with factor graphs: a SLAM graph
We’ll focus on how to model the odometry factor
55
T1
T2
T3
Prior factor
Sensor factor
Odometry factor
Odometry factor
The odometry factor establishes the relationship between T1 and T2 given an odometry measurement ΔT. They are poses in SE(3)
It’s straightforward that the equation is:
Here we’re using the right hand notation to apply the measurement… but why?
56
T1
T2
The importance of the frames
We mentioned that GTSAM uses this notation, so it makes sense to follow it
But we need to be completely sure that we are doing the right thing
Here we’re missing an important physical concept: the reference frames
57
The importance of the frames
When we write down this:�
��We are implying the following:
58
Furgale (2014), “Representing Robot Pose: The good, the bad, and the ugly.”
Graphically
59
Furgale (2014), “Representing Robot Pose: The good, the bad, and the ugly.”
Graphically
60
We are expressing poses of the body B in a fixed frame W
Furgale (2014), “Representing Robot Pose: The good, the bad, and the ugly.”
Graphically
61
We are expressing poses of the body B in a fixed frame W
Our odometry measurements are relative with respect to the previous frame B1
Furgale (2014), “Representing Robot Pose: The good, the bad, and the ugly.”
Graphically
62
We are expressing poses of the body B in a fixed frame W
Our odometry measurements are relative with respect to the previous frame B1
The resulting expression represent the pose of the body B, in time 2, wrt to the same fixed frame W
Furgale (2014), “Representing Robot Pose: The good, the bad, and the ugly.”
The importance of the frames
We can confirm the frames are right:
63
The importance of the frames
We can confirm the frames are right:
64
Subscripts should match between transformations and “eliminate” each other*
* Please note that writing the reference frame on the left is redundant for pose, it is mainly relevant for vectors (Furgale, 2014) -> Thanks to Marco Camurri for mentioning this
Creating Gaussian factors
The odometry expression that we have is consistent with the frames, but it’s not a probability distribution
Hence, we cannot use it as a factor. What can we do?
65
Probability distributions in SE(3)
Let’s recall some slides ago what we did in the linear case
We added a noise term with the desired distribution (zero-mean Gaussian)
Even though we are using Lie groups/manifolds, we can do the same trick
66
Probability distributions in SE(3)
67
Probability distributions in SE(3)
68
We add a zero mean Gaussian distribution defined on the tangent space, and project it back to the group using the Exponential map
Important considerations
69
We apply the noise on the right side to be consistent with right hand notation
The covariance we set here is expressed in base B at time 2
Graphical interpretation of the distribution
70
* If we sample this distribution and plot the poses, we’ll obtain the “banana-shaped” distribution expected for poses
1. We define a distribution in the tangent space
2. We project it back to the group using the Exponential Map to define a distribution on the manifold*
Probability distributions in SE(3)
We isolate the noise on the right to obtain a zero-mean distribution on both sides
71
Probability distributions in SE(3)
We isolate the noise on the right to obtain a zero-mean distribution on both sides
We can check everything is consistent by applying the inverses (which swaps the subscripts and the reference frame)
72
Probability distributions in SE(3)
We isolate the noise on the right to obtain a zero-mean distribution on both sides
And applying the subscript elimination trick
73
An object defined in base B at time 2
Defining Gaussian factors
Now this expression is a Gaussian distribution in SE(3)
Which we can use to define the factor:
This is the definition of the “BetweenFactor” in GTSAM
74
Extracting uncertainties from the solution
From our analysis we can define any probability distribution on SE(3) as
This expression helped us to define Gaussian factors used in the graph.
But it also applies when we want to extract covariances from the solution
75
Extracting uncertainties from the solution
Let’s say we managed to invert the information matrix from the factor graph solution
The inverse matrix will keep all the covariances and cross covariances of the variables involved
In GTSAM, they correspond to covariances in the “base” frame (right hand side)
76
Probability distribution of the solution
Last comments
The definition of the probability distribution is useful to compute other operations and manipulate the uncertainties accordingly. For instance:�
Distribution of the inverse
77
We use the Adjoint to move the Exp( ) to the right
Proper distribution using the right hand convention
Last comments
Distribution of the inverse
78
The covariance gets transformed by the Adjoint and
now it represents the uncertainty in the world frame W
Other operations
A similar analysis and tools (Exponential and Logarithm map, Adjoint) can be used to derive other expressions for distributions*:
In general, the means are easy to compute, but the covariances are tricky**
But if we follow the math and conventions, the resulting formulas should match the physical interpretation (i.e, reference frames)
79
*Not covered here but papers are attached in the end (happy to discuss them as well!)
**Some properties of the Exponential map do not follow the usual properties of the exponential function:
Conclusions
Conclusions
1. Lie groups are useful to unify many operations we usually do*
81
*GTSAM is actually based on manifolds and retractions (a more general view)
Conclusions
2. Conventions are super important
They allow us to make sense of the quantities we plug in and extract from our estimation problems.
Simple example of potential problems:
82
Conclusions
3. Conventions are not necessarily well-documented
83
Resources - papers
Lie Groups
84
Resources - papers
Manipulating uncertainty on Lie Groups
85
Resources - libraries
Other papers
86
Resources - libraries
Lie Group libraries
87
Handling Uncertainty in
Estimation Problems with Lie Groups