1 of 88

Handling Uncertainty in

Estimation Problems with Lie Groups

Matías Mattamala

matias@robots.ox.ac.uk

25/01/2020

2 of 88

Motivation

Optimization-based estimation algorithm

Measurements/ models with uncertainties

Estimated variables

with uncertainties

How to ensure everything is consistent?

3 of 88

Contents

Factor graphs

Brief overview on the inputs and outputs

Lie Groups

Review of the meaning and main operators

Bringing everything together

How to combine the previous ideas consistently in an estimation context

4 of 88

Part 1: Factor graphs

5 of 88

Lightspeed review

X₁

X₂

X₃

p₁

p₂

X_n

X_n+1

X₃

A factor graph factorizes a function

Dellaert & Kaess (2017), “Factor Graphs for Robot Perception”

6 of 88

Lightspeed review

X₁

X₂

X₃

p₁

p₂

X_n

X_n+1

X₃

Variables (unknown)

7 of 88

Lightspeed review

p₁

p₂

Factors

(known)

X₁

X₂

X₃

X_n

X_n+1

X₃

Factors indicate how the variables relate each other

8 of 88

Factor graphs can be used to describe many problems

Dellaert (2020), “Factor graphs: Exploiting structure in robotics”, Annual Reviews

9 of 88

DRS examples

VILENS

KINS

Pronto-SLAM

VTR

DGPMP2

TOFG

X_i

X₁

X₂

X₃

X₄

Work in progress

Under review

Work in progress

10 of 88

Factorized function

X₁

X₂

X₃

p₁

p₂

X_n

X_n+1

X₃

A factor graph factorizes a function

11 of 88

Factors as distributions

In general, the factors are given by probability distributions

So the factor graph represents a factorization of the joint distribution of many variables

12 of 88

Finding the unknowns

In the factor graph, the variables represent parameters of the distribution that must be found

We assume the parameters are single quantities (i.e, not distributions themselves)

Any parameter estimation method can be used

13 of 88

Finding the unknowns with MAP

The most common solution is maximum a posteriori (MAP)

14 of 88

Finding the unknowns with MAP

The most common solution is maximum a posteriori (MAP)

Since maximizing a product is difficult, we generally minimize the negative log-likelihood instead

*please note the negative sign convert the maximization into a minimization

15 of 88

Now it comes the assumptions

To make things easier, we assume the distributions are Gaussian

However, we don’t say that X_i is Gaussian itself.

16 of 88

Now it comes the assumptions

We say the error or residual between a function of X_i and some prior knowledge z_i (e.g. measurements) is Gaussian

(we’ll save this trick for later)

Distributes as Gaussian

Sigma corresponds to the sensor or model covariance

17 of 88

Now it comes the assumptions

So, the probability of each factor is given by

18 of 88

Least squares minimization

This converts the problem into a (nonlinear) least squares minimization

Gaussian assumption

(Here we ignore some constant terms)

19 of 88

Inspecting the Least Squares (LS) problem

This term is usually ignored since Sigma is constant*

*We can keep the term and also optimize it (but it would require other solvers/strategies, such as expectation-maximization). In this case, we would be optimizing the sensor models (“learning the covariances”).

20 of 88

Inspecting the Least Squares (LS) problem

These sigmas (measurement/model covariances) are the only uncertainties we plug into the system

Error or Residual of the factor

21 of 88

Inspecting the Least Squares (LS) problem

The solution only returns a value but not an uncertainty estimate

(LS is a point estimator)

22 of 88

Solving the Least Squares (LS) problem

LS can be solved in closed-form if h(X) is linear

23 of 88

Solving the Least Squares (LS) problem

By setting the derivative to zero,

we find the normal equations

24 of 88

Solving the Least Squares (LS) problem

By setting the derivative to zero,

we find the normal equations

Information matrix,

Fisher Information,

or Hessian

25 of 88

Analyzing the Information Matrix

Obs. 1: Matrix A cannot have zero columns since they can indeterminate the linear system

26 of 88

Analyzing the Information Matrix

Obs. 2: The new matrix is denser than

so it’s expensive to invert to solve the linear system

Factoring the matrix and reordering the rows and columns is the usual trick to solve by back-substitution without inverting

iSAM,

qpSWIFT

27 of 88

Analyzing the Information Matrix

Obs. 3: The matrix approximates* the information (inverse covariance) of the solution of the LS problem

However, we need to invert the matrix to obtain the covariance of the solution

Marginal information matrices can be obtained, but still require inversion to get covariances

*It approximates the covariance of the solution because we’re not explicitly optimizing the covariance of the solution - this is called a “Laplace approximation”. To optimize the covariance as well we can use variational inference - Barfoot has some recent papers on it.

28 of 88

Nonlinear factor graphs

If h(X) is a nonlinear function, we must use a nonlinear optimization algorithm

29 of 88

Nonlinear factor graphs

Algorithms such as Gauss-Newton or Levenberg-Marquardt will linearize the system around the initial solution (linearization point)

Jacobian of h(X)

Linearization point

30 of 88

Solving nonlinear factor graphs

Applying the linearization we obtain a linear system as before

31 of 88

Normal equations

So we can solve it with the same normal equations (with Jacobians instead of the A matrix)

Jacobians must be well-defined (i.e, no zero columns)

Levenberg-Marquardt can solve ill-conditions

(but cannot be used in incremental problems with iSAM)

32 of 88

Updating the solution

The solution of the linear system is a small correction to the current linearization point to improve the solution

We relinearize in the new linearization point and repeat until convergence

All the covariances that can be extracted from the information matrix are valid for the linearization point only

33 of 88

Summary

Covariance of the solution. Obtained from the information matrix of the linear system

34 of 88

Summary

Covariance of the solution. Obtained from the information matrix of the linear system

Factors (measurements and covariances)

are the inputs

Variables

(mean and covariances)

are the outputs

35 of 88

Part 2: Lie groups

36 of 88

Dealing with rotations and transformations

When designing a factor graph, we need factors with residual functions that output vector quantities.

Some factors can have rotations or rigid body transformations involved.

37 of 88

3D Rotations

Rotations can be represented in many ways, but have 3 inherent DoF

Euler angles (ɑ,β,𝛄)

3x1 vector

Axis-angle (v,θ)

3x1 vector

Quaternions q=(w,x,y,z)

Unit 4x1 vector

Rotation matrices R

3x3 orthogonal matrix

How to measure the difference of rotations? What about poses?

38 of 88

Lie Groups

Lie Groups offer a principled way to solve these issues with a common framework

Rotation matrices, quaternions and rigid-body transformations are Lie groups, so the same principles can be applied with all of them.

Special Orthogonal Group

(Rotations)

Special Euclidean Group

(Rigid-body transformations)

39 of 88

Lie Groups, roughly

A smooth manifold

A group

A differentiable surface that “looks Euclidean” at any point

A set with a composition operation and basic properties

Closure

Identity

Inverse

Associativity

Solà, Deray, Atchuthan (2018), “A micro Lie theory for state estimation in robotics”, arXiv

40 of 88

Lie Groups

The main advantage of Lie Groups is that any element on the manifold can be mapped into a Euclidean vector space (Lie algebra)

So we can use the same tools from vector spaces, plus some extra rules

Lie Group

Lie algebra

41 of 88

Lie Groups - with figures

Lie Group

Vector space

“Lie algebra”*

*Note: This is not exactly the Lie algebra, but a mapping can be established through the hat and vee operators

42 of 88

Lie Groups - with figures

Lie Group

Vector space

“Lie algebra”

Logarithm map

43 of 88

Lie Groups - with figures

Lie Group

Vector space

“Lie algebra”

Logarithm map

Exponential map

44 of 88

Lie Groups - with figures

Group defined at the identity

45 of 88

Lie Groups - with figures

Group defined at T

46 of 88

Lie Groups - with figures

Right-hand convention

Used in GTSAM - Dellaert, Carlone, Scaramuzza, Solà

and DRS

47 of 88

Lie Groups - with figures

Left-hand convention

Used by Barfoot, Mangelson, and other authors

48 of 88

Lie Groups - Adjoint operator

Left-hand convention

Right-hand convention

49 of 88

Lie Groups - Adjoint operator

Left-hand convention

Right-hand convention

50 of 88

Importance of the convention in optimization problems

The Exponential and Logarithm map allow us to map quantities between the Lie group and Lie algebra

We can define residuals for the factor graph using the Logarithm map

Note: The definition of the logarithm map determines the order of the variables

In GTSAM:

Pose3 uses (rx, ry, rz, x, y, z)
Pose2 uses (x, y, θ) (x, y, θ) -> Thanks to Milad Ramezani for pointing this out!
Papers and libraries can vary in this definition too

51 of 88

Importance of the convention in optimization problems

The Logarithm map also determines the ordering of the covariance matrices

(input matrices in the factors, and output matrices in the information matrix)

Obtained from the information matrix of the linear system

52 of 88

Importance of the convention in optimization problems

The optimization loop also becomes affected.

The linearized linear system is solved in the tangent space, using vectors

However, instead of using the update rule:

We map the correction from the tangent space back on the manifold

(Right hand convention also important here)

53 of 88

Graphically

1. Linearization of all the factors evaluated at

- “Lifting”

2. Computation of optimization correction using linear system (normal equations)

3.Update the variables back on the manifold - “Retracting”

Forster et al (2016), “On-Manifold Preintegration for Real-Time Visual-Inertial Odometry”, T-RO

54 of 88

Part 3: Bringing everything together

55 of 88

A pose graph SLAM problem

Let’s return to modelling estimation problems with factor graphs: a SLAM graph

We’ll focus on how to model the odometry factor

T₁

T₂

T₃

Prior factor

Sensor factor

Odometry factor

56 of 88

Odometry factor

The odometry factor establishes the relationship between T₁ and T₂ given an odometry measurement ΔT. They are poses in SE(3)

It’s straightforward that the equation is:

Here we’re using the right hand notation to apply the measurement… but why?

T₁

T₂

57 of 88

The importance of the frames

We mentioned that GTSAM uses this notation, so it makes sense to follow it

But we need to be completely sure that we are doing the right thing

Here we’re missing an important physical concept: the reference frames

58 of 88

The importance of the frames

When we write down this:�

��We are implying the following:

Furgale (2014), “Representing Robot Pose: The good, the bad, and the ugly.”

59 of 88

Graphically

Furgale (2014), “Representing Robot Pose: The good, the bad, and the ugly.”

60 of 88

Graphically

We are expressing poses of the body B in a fixed frame W

Furgale (2014), “Representing Robot Pose: The good, the bad, and the ugly.”

61 of 88

Graphically

We are expressing poses of the body B in a fixed frame W

Our odometry measurements are relative with respect to the previous frame B₁

Furgale (2014), “Representing Robot Pose: The good, the bad, and the ugly.”

62 of 88

Graphically

We are expressing poses of the body B in a fixed frame W

Our odometry measurements are relative with respect to the previous frame B₁

The resulting expression represent the pose of the body B, in time 2, wrt to the same fixed frame W

Furgale (2014), “Representing Robot Pose: The good, the bad, and the ugly.”

63 of 88

The importance of the frames

We can confirm the frames are right:

64 of 88

The importance of the frames

We can confirm the frames are right:

Subscripts should match between transformations and “eliminate” each other*

* Please note that writing the reference frame on the left is redundant for pose, it is mainly relevant for vectors (Furgale, 2014) -> Thanks to Marco Camurri for mentioning this

65 of 88

Creating Gaussian factors

The odometry expression that we have is consistent with the frames, but it’s not a probability distribution

Hence, we cannot use it as a factor. What can we do?

66 of 88

Probability distributions in SE(3)

Let’s recall some slides ago what we did in the linear case

We added a noise term with the desired distribution (zero-mean Gaussian)

Even though we are using Lie groups/manifolds, we can do the same trick

67 of 88

Probability distributions in SE(3)

68 of 88

Probability distributions in SE(3)

We add a zero mean Gaussian distribution defined on the tangent space, and project it back to the group using the Exponential map

69 of 88

Important considerations

We apply the noise on the right side to be consistent with right hand notation

The covariance we set here is expressed in base B at time 2

70 of 88

Graphical interpretation of the distribution

* If we sample this distribution and plot the poses, we’ll obtain the “banana-shaped” distribution expected for poses

1. We define a distribution in the tangent space

2. We project it back to the group using the Exponential Map to define a distribution on the manifold*

71 of 88

Probability distributions in SE(3)

We isolate the noise on the right to obtain a zero-mean distribution on both sides

72 of 88

Probability distributions in SE(3)

We isolate the noise on the right to obtain a zero-mean distribution on both sides

We can check everything is consistent by applying the inverses (which swaps the subscripts and the reference frame)

73 of 88

Probability distributions in SE(3)

We isolate the noise on the right to obtain a zero-mean distribution on both sides

And applying the subscript elimination trick

An object defined in base B at time 2

74 of 88

Defining Gaussian factors

Now this expression is a Gaussian distribution in SE(3)

Which we can use to define the factor:

This is the definition of the “BetweenFactor” in GTSAM

75 of 88

Extracting uncertainties from the solution

From our analysis we can define any probability distribution on SE(3) as

This expression helped us to define Gaussian factors used in the graph.

But it also applies when we want to extract covariances from the solution

76 of 88

Extracting uncertainties from the solution

Let’s say we managed to invert the information matrix from the factor graph solution

The inverse matrix will keep all the covariances and cross covariances of the variables involved

In GTSAM, they correspond to covariances in the “base” frame (right hand side)

Probability distribution of the solution

77 of 88

Last comments

The definition of the probability distribution is useful to compute other operations and manipulate the uncertainties accordingly. For instance:�

Distribution of the inverse

We use the Adjoint to move the Exp( ) to the right

Proper distribution using the right hand convention

78 of 88

Last comments

Distribution of the inverse

The covariance gets transformed by the Adjoint and

now it represents the uncertainty in the world frame W

79 of 88

Other operations

A similar analysis and tools (Exponential and Logarithm map, Adjoint) can be used to derive other expressions for distributions*:

Composition
Difference of distributions
Interpolation
Averaging
Gaussian processes

In general, the means are easy to compute, but the covariances are tricky**

But if we follow the math and conventions, the resulting formulas should match the physical interpretation (i.e, reference frames)

*Not covered here but papers are attached in the end (happy to discuss them as well!)

**Some properties of the Exponential map do not follow the usual properties of the exponential function:

80 of 88

Conclusions

81 of 88

Conclusions

1. Lie groups are useful to unify many operations we usually do*

The Logarithm map allows us to compute vector differences

Between poses, rotations and other objects
Useful to compute residuals for both estimation and control
As long as a Logarithm map exists, we don’t have to care about manually choosing how to subtract quantities

We can generate any element of a Lie Group from a vector, through the Exponential map

So we can apply vector corrections directly to group elements

*GTSAM is actually based on manifolds and retractions (a more general view)

82 of 88

Conclusions

2. Conventions are super important

Frame conventions
Right-convention for Lie groups

It’s related to our convention for the frames (world on the left, base on the right)

Logarithm map

They allow us to make sense of the quantities we plug in and extract from our estimation problems.

Simple example of potential problems:

Covariances in GTSAM’s Pose3: orientation, then position
Covariances in ROS’ PoseWithCovariance: position, then orientation

83 of 88

Conclusions

3. Conventions are not necessarily well-documented

From the papers we can imply who uses which convention
For the implementations is better to check the Logarithm map and how they apply the Exponential map (left or right)
Covariance handling is not well documented in general

Fun fact: We were having this conversation in the gtsam-users group

84 of 88

Resources - papers

Lie Groups

Solà, Deray, Atchuthan (2018), “A micro Lie theory for state estimation in robotics”, arXiv

Complete and short introduction. It covers the main theoretical aspects and some practical estimation examples. Right-hand convention

Dellaert (2020), GTSAM docs

The documentation of GTSAM has many documents on Lie groups and optimization on manifolds. Right-hand convention

Lynch and Park (2016), "Modern Robotics: Mechanics, Planning, and Control", Cambridge University Press

Available online
Explains Lie groups in the context of kinematics and dynamics
It uses both conventions but it’s explicit in declaring which one is being used

85 of 88

Resources - papers

Manipulating uncertainty on Lie Groups

Barfoot & Furgale (2014), “Associating Uncertainty with Three-Dimensional Poses for Use in Estimation Problems”, T-RO

First paper addressing composition and fusion of uncertain 3D poses. Left-hand convention

Mangelson et al. (2020), “Characterizing the Uncertainty of Jointly Distributed Poses in the Lie Algebra”, T-RO

This builds upon Barfoot and Furgale and extends to composition, inversion, and difference of correlated poses. Left-hand convention

Dong, Boots, Dellaert (2017), “Sparse Gaussian Processes for Continuous-Time Trajectory Estimation on Matrix Lie Groups”, arXiv

Describes the theory to compute sparse Gaussian Processes to represent continuous trajectories with factor graphs. Right-hand convention
Used for continuous time slam and motion planning (GPMP2)

86 of 88

Resources - libraries

Other papers

Calinon (2020), “Gaussians on Riemannian Manifolds: Applications for Robot Learning and Adaptive Control”, ICRA

Follows a similar approach to define Gaussians on Riemannian Manifolds
Their Gaussian distribution is different
Riemannian manifolds seems to be more general than Lie Groups (I’m not clear about the technical differences yet)

87 of 88

Resources - libraries

Lie Group libraries

GTSAM (https://github.com/borglab/gtsam/ )

All their data structures are defined as manifolds, hence the operators (Exp, Log, Ad, etc) are defined. Right-hand convention

Pinocchio (https://github.com/stack-of-tasks/pinocchio )

Implements Lie Groups structures for their computations. Right-hand convention

Lie groups in Python (https://github.com/utiasSTARS/liegroups )

Implementation for numpy and Pytorch. Left-hand convention

88 of 88

Handling Uncertainty in

Estimation Problems with Lie Groups

Matías Mattamala

matias@robots.ox.ac.uk

25/01/2020