The Structure of Optimal Nonlinear Feedback Control and its Implications
Suman Chakravorty
Professor, Aerospace Engineering
Texas A&M University
College Station, TX
1
Department of Aerospace Engineering
Acknowledgements
D. Yu, Nanjing University of Aeronautics and Astronautics
M. RafeiSakhaei, Vicarious Robotics
R. Wang, Rockwell Automation
K. Parunandi, Cruise
* M. N. Gul Mohamed, TAMU
* A. Sharma, TAMU
R. Goyal, PARC
D. Kalathil, TAMU
Bob Skelton, TAMU
P. R. Kumar, TAMU
Erik Blasch and Frederica Darema, AFOSR DDIP Program
Kishan Baheti, NSF EPCN and NRI Program
Daryl Hess, NSF DMR and CDS&E Program
Marc Steinberg, ONR Science of Autonomy Program
2
Department of Aerospace Engineering
Introduction
Search for optimal control law
Unknown dynamics
Learning under uncertainty
Partial observation
Cahn-Hilliard Equation:
The Case for Reinforcement Learning/ Data-based Control
3
Department of Aerospace Engineering
Introduction
Very High DOF Systems
Bionic fish robot
Bionic snake robot
Tensegrity airfoil
Tensegrity arm
High dimensionality
Complex models
Data-based
Limited sensing
Partial observation
Model, Process and Sensing uncertainty
Learning under uncertainty
Material Microstructures
4
Department of Aerospace Engineering
Background
the number of variables grows exponentially as Kd !
[1] D. P. Bertsekas. Dynamic Programming and Optimal Control, vols I and II. Cambridge, MA: Athena Scientific, 2012
[2] R. E. Bellman. Dynamic Programming. Princeton, NJ: Princeton University Press, 1957
5
Department of Aerospace Engineering
Background
[1] D. P. Bertsekas. Dynamic Programming and Optimal Control, vols I and II. Cambridge, MA: Athena Scientific, 2012
[2] R. E. Bellman. Dynamic Programming. Princeton, NJ: Princeton University Press, 1957
6
Department of Aerospace Engineering
Background
[3] R. Goyal, R. Wang and S. Chakravorty, “ On the Convergence of Reinforcement Learning”, IEEE International Conference on Decision and Control, 2021, Austin, TX
7
Department of Aerospace Engineering
Background
Reinforcement learning - DDPG [4]
System identification
[4] Lillicrap, T. P.; Hunt, J. J.; Pritzel, A.; Heess, N.; Erez, T.; Tassa, Y.; Silver, D. & Wierstra, D. (2016), Continuous control with deep reinforcement learning, in Yoshua Bengio & Yann LeCun, 'ICLR'.
[5] J.-N. Juang and R. S. Pappa, “An eigensystem realization algorithm for modal parameter identification and model reduction,” Journal of Guidance, Control, and Dynamics, vol. 8, no. 5, pp. 620–627, 1985.
[6] S. L. Brunton, J. L. Proctor, and J. N. Kutz, “Discovering governing equations from data by sparse identification of nonlinear dynamical systems,” Proceedings of the National Academy of Sciences, vol. 113, no. 15, pp. 3932–3937, 2016.
Training still takes a very long time and solution has high variance
8
Department of Aerospace Engineering
Background
[7] D. Q. Mayne. “Model Predictive Control: Recent Developments and Future Promise”. Automatica (2014).
[8] D.Q. Mayne. “Robust and Stochastic MPC: Are We Going In The Right Direction?” 2015. IFAC-PapersOnLine 23 (2015). 5th IFAC Conference on Nonlinear Model Predictive Control NMPC
9
Department of Aerospace Engineering
Background
Gradient descent [9]
Differential dynamic programming (DDP) [10]
Iterative linear quadratic regulator (ILQR) [11]
[9] S. Ruder, “An overview of gradient descent optimization algorithms,” arXiv: 1609.04747,2017.
[11] Y. Tassa, T. Erez, and E. Todorov, “Synthesis and stabilization of complex behaviors through online trajectory optimization,” in 2012 IEEE/RSJ International Conference on Intelligent Robots and Systems, pp. 4906–4913, IEEE, 2012.
[10] D. Jacobsen and D. Q. Mayne, Differential Dynamic Programming. Elsevier,1970.
10
Department of Aerospace Engineering
The Roadmap
11
Department of Aerospace Engineering
Outline
12
Department of Aerospace Engineering
Problem Formulation
Bellman equation
[15]Naveed Gul Mohamed, M., Chakravorty, S., Goyal, R., and Wang, R., “On the Optimal Feedback Law in Stochastic Optimal Nonlinear Control”, arXiv:2004.01041, IEEE Transactions on Automatic Control, under revision
[16] Naveed Gul Mohamed, M., Chakravorty, S., Goyal, R., and Wang, R., “On the Optimal Feedback Law in Stochastic Optimal Nonlinear Control”, American Control Conference, 2022.
13
Department of Aerospace Engineering
Near Optimality of Deterministic Law
14
Department of Aerospace Engineering
Near Optimality of Deterministic Law
15
Department of Aerospace Engineering
Near Optimality of Deterministic Law
16
Department of Aerospace Engineering
Perturbation Structure of Deterministic Law
17
Department of Aerospace Engineering
Perturbation Structure of Deterministic Law
Global optimality: if the dynamics f, g and the cost l are C2, then the solution of the characteristic ODE exists and is unique, i.e., satisfying the minimum principle is sufficient for global optimality, nonlinear dynamics and costs do not matter
18
Department of Aerospace Engineering
Perturbation Structure of Deterministic Law
and so on..
deterministic law.
19
Department of Aerospace Engineering
Decoupling Principle
Remarks
Training efficiency
Not LQR
20
Department of Aerospace Engineering
The Stochastic Problem
law has to be expanded to a high enough order for accuracy!
= 0, under MP
[15] Naveed Gul Mohamed, M., Chakravorty, S., Goyal, R., and Wang, R., “On the Optimal Feedback Law in Stochastic Optimal Nonlinear Control”, arXiv:2004.01041, under revision for the IEEE Transactions on Automatic Control
Does it make sense to do stochastic MPC?
21
Department of Aerospace Engineering
Decoupled Data-based Control (D2C)
Necessary condition
Iteration till convergence:
Taylor expansion
Backward pass
Forward pass
Co-state
Actor
Critic
[16] Wang, R., Parunandi, K. S., Sharma, A., Goyal, R., and Chakravorty, S., “On the Search for Feedback in Reinforcement Learning”, arXiv:2002.09478, under review for the IEEE Transactions on Automatic Control
[17] Wang, R., Parunandi, K. S., Sharma, A., Goyal, R., and Chakravorty, S., “On the Search for Feedback in Reinforcement Learning”, IEEE International Conference on Decision and Control, 2021
22
Department of Aerospace Engineering
Decoupled Data-based Control (D2C)
Collect data from simulation experiments:
Solve for the linearized dynamics:
Data-based iLQR
23
Department of Aerospace Engineering
Decoupled Data-based Control (D2C) Algorithm
24
Department of Aerospace Engineering
Optimality and Convergence
ILQR is Sequential Quadratic Programming (SQP), DDP is overkill to get to an open loop minimum
[16] Wang, R., Parunandi, K. S., Sharma, A., Goyal, R., and Chakravorty, S., “On the Search for Feedback in Reinforcement Learning”, arXiv:2002.09478, submitted to IEEE Transactions on Automatic Control
[17] Wang, R., Parunandi, K. S., Sharma, A., Goyal, R., and Chakravorty, S., “On the Search for Feedback in Reinforcement Learning”, IEEE International Conference on Decision and Control, 2021
25
Department of Aerospace Engineering
Optimality and Convergence
[18] P. T. Boggs and J. W. Tolle, “Sequential quadratic programming,” ActaNumerica, vol. 4, p. 1–51, 1995.
Assumptions:
Global convergence to a stationary point
[16] Wang, R., Parunandi, K. S., Sharma, A., Goyal, R., and Chakravorty, S., “On the Search for Feedback in Reinforcement Learning”, arXiv:2002.09478, submitted to the IEEE Transactions on Automatic Control.
[17] Wang, R., Parunandi, K. S., Sharma, A., Goyal, R., and Chakravorty, S., “On the Search for Feedback in Reinforcement Learning”, IEEE International Conference on Decision and Control, 2021
26
Department of Aerospace Engineering
Optimality and Convergence
Using the Method of Characteristics result from before regarding the sufficiency of the Minimum Principle [15, 16]:
Global convergence to the global minimum
Training reliability
Efficient and reliable training can be combined with replanning as in MPC to obtain a global feedback policy.
[15] M. N. G. Mohamed, S. Chakravorty, R. Goyal, and R. Wang, "On the Feedback law in Stochastic Nonlinear Optimal Control", Proceedings American Control Conference (ACC), June 2022
[16] Naveed Gul Mohamed, M., Chakravorty, S., Goyal, R., and Wang, R., “On the Optimal Feedback Law in Stochastic Optimal Nonlinear Control”, arXiv:2004.01041, submitted to the IEEE Transactions on Automatic Control
27
Department of Aerospace Engineering
Empirical Results
Fish
6-link Swimmer
Cartpole
We use simulation model as a blackbox in the physics engine MuJoCo [11].
[19] E. Todorov, T. Erez, and Y. Tassa, “Mujoco: A physics engine for model-based control,” in 2012 IEEE/RSJ International Conference on Intelligent Robots and Systems. IEEE, 2012, pp. 5026–5033
28
Department of Aerospace Engineering
Material Microstructure Control
Cahn-Hilliard Equation:
Allen-Cahn Equation:
29
Department of Aerospace Engineering
Training Efficiency
Advantages of D2C on training efficiency compared with DDPG (trained on PC)
DDPG – Cartpole
DDPG – 6-link Swimmer
DDPG – Fish
D2C – Cartpole
D2C – 6-link Swimmer
D2C – Fish
6306.7 s
88160.0 s
124367.6 s
0.55 s
127.2 s
54.8 s
DDPG and D2C have the exact same
access to data from the model
30
Department of Aerospace Engineering
Training Efficiency and Variance
Advantages of D2C on training efficiency and reliability
Training variance comparison
Training reliability
Training efficiency
Is sample efficiency the right metric to test RL? What
about the answer to which RL converges?
31
Department of Aerospace Engineering
Closed-loop Performance Under Noise
Cartpole
6-link swimmer
Fish
Robustness
Remarks
32
Department of Aerospace Engineering
D2C Performance Summary
Data-based
Efficient and reliable training
High-dimensional nonlinear stochastic systems
Robust to process noise
Global optimality
33
Department of Aerospace Engineering
The Connection to MPC
only when necessary while employing the optimal
feedback law (T-PFC)
We replan for stochasticity, not to account for the infinite horizon
34
Department of Aerospace Engineering
The Connection to MPC
Performance comparison for multiple different robotic planning problems.
35
Department of Aerospace Engineering
The Connection to MPC
A comparison of fixed and shrinking horizon MPC for different (fixed) horizon lengths
The empirical evidence suggests shrinking the horizon in MPC results in much better performance and it should be feasible to
maintain the stability and feasibility guarantees as in traditional MPC
36
Department of Aerospace Engineering
Intractability of the Stochastic Problem
We compare the performance of the approximate optimal feedback law found by computationally
solving the stochastic HJB, with the deterministic feedback law, implemented using MPC.
Results show that the MPC feedback law performs much better in higher noise regimes: empirical evidence for the sensitivity of the stochastic DP solution.
37
Department of Aerospace Engineering
Takeaway: Local is Key!
Optimal Feedback control is equivalent to solving the HJB PDE. We can try to solve the PDE globally, as in ADP/ RL, in which case we run into the COD for most practical problems and have unreliable solutions. Alternatively, we can get a local solution, which we modify when required online a la MPC. The classical method of solving a first order PDE is the Method of Characteristics (MOC). MPC repeatedly finds the characteristic curve (open loop) from the current state while we, in practice, advocate finding a local solution (linear feedback) around the nominal curve and replanning only when necessary. The local approach is far more scalable, accurate and reliable while MPC style replanning assures global applicability. The Stochastic problem is fundamentally intractable owing to the fact that it loses the perturbation structure, i.e., there is no notion of a local solution unlike in the deterministic case.
38
Department of Aerospace Engineering
Future Directions
39
Department of Aerospace Engineering
Acrobot
40
Department of Aerospace Engineering
Thank you
41
Department of Aerospace Engineering
Training Under Noise
DDPG training under noise
Cartpole – process noise in control
Cartpole – process noise in state and control
Pendulum – process noise in control
Pendulum – process noise in state and control
Remarks
42
Department of Aerospace Engineering
Partially observed fish model with DDPG
Direct RL method
43
Department of Aerospace Engineering
Comparison with model-based tensegrity control
Model-based shape control
D2C closed-loop policy
T2D1 Tensegrity Model
Faster
Slower
44
Department of Aerospace Engineering
Comparison with model-based tensegrity control
Reacher – closed-loop performance
Reacher – control energy
45
Department of Aerospace Engineering
Failed
Succeeded
Fish Model
Red ball: Target
Measurements:
27 total states
Simulation Results
Open-loop nominal only
POD2C with ARMA-LQG as closed-loop
46
Department of Aerospace Engineering
Exact Match of ARMA Model and LTV System
LTV system linearized around nominal trajectory
full column rank
47
Department of Aerospace Engineering
Background
Finite difference (FD) [6]
[6] E. H., N., “The Calculus of Finite Differences”, Nature, vol. 134, no. 3381, pp. 231–233, 1934. doi:10.1038/134231a0.
Eigen realization algorithm (ERA) [7]
[7] Jer-Nan Juang and Richard S. Pappa, "An Eigensystem Realization Algorithm for Modal Parameter Identification and Model Reduction". Journal of Guidance, Control, and Dynamics, 1985 8:5, 620–627.
Partially observable Markov decision process (POMDP) [8]
[8] Åström, Karl Johan, Optimal Control of Markov Processes with Incomplete State Information I, Journal of Mathematical Analysis and Applications, 1965, 10. p.174-205
48
Department of Aerospace Engineering
Information State Optimization
Stochastic system
Information state
Past observations
Past inputs
Nominal trajectory
Cost on information state
49
Department of Aerospace Engineering
Global Optimal Solution
Expand the output
Implicit function theorem
Expand the state:
Information state
Unique function
Unique mapping
50
Department of Aerospace Engineering
Biased Nature in Partially Observed Case
51
Department of Aerospace Engineering