1 of 30

Logical Specifications meet�Learning-based Control

Jyo Deshmukh,

Thomas Lord Dept. of CS, University of Southern California (USC)

CDC Workshop: Formal Methods and Decision-making in the age of AI

2 of 30

Paradigm shift in design of autonomous systems

2/30

Classical approach: Model-based, Well-defined uncertainty models

Brave new world:

Highly uncertain environments, do not easily lend to modeling, data-driven and model-free

CDC Workshop: Formal Methods and Decision-making in the age of AI

3 of 30

Safe control problem

3/30

CDC Workshop: Formal Methods and Decision-making in the age of AI

4 of 30

How do we solve these problems?

4/30

CDC Workshop: Formal Methods and Decision-making in the age of AI

5 of 30

How do we solve these problems?

5/30

CDC Workshop: Formal Methods and Decision-making in the age of AI

6 of 30

Signal Temporal Logic^1,2,3

6/30

0

100

50

1

3

Always between time 0 and 100

Eventually at some time t between time 20 and 60

From that time t, always till the end of the signal trace

0

100

1

-0.1

+0.1

60

[1] Maler, Oded, and Dejan Nickovic. "Monitoring temporal properties of continuous signals." In Formal Techniques, Modelling and Analysis of Timed and Fault-Tolerant Systems, pp. 152-166, 2004.

[2] Donzé, Alexandre, and Oded Maler. "Robust satisfaction of temporal logic over real-valued signals." International Conference on Formal Modeling and Analysis of Timed Systems, 2010.

[3] Georgios E. Fainekos, George J. Pappas: Robustness of temporal logic specifications for continuous-time signals. Theor. Comput. Sci. 410(42): 4262-4291 (2009)

CDC Workshop: Formal Methods and Decision-making in the age of AI

7 of 30

Specifying autonomous driving constraints and specifications

7/30

CDC Workshop: Formal Methods and Decision-making in the age of AI

8 of 30

STL : Quantitative semantics

8/30

GOOD

How far is

bad?

0

100

50

3

BAD

0

100

50

3

BAD

How bad is the violation?

CDC Workshop: Formal Methods and Decision-making in the age of AI

9 of 30

Quantitative Semantics of STL or “Robustness” value

9/30

CDC Workshop: Formal Methods and Decision-making in the age of AI

10 of 30

How do we solve these problems?

10/30

CDC Workshop: Formal Methods and Decision-making in the age of AI

11 of 30

Neuro-symbolic Safe Controller Synthesis: High-level idea^1,2

11/30

[1] N. Hashemi, X. Qin, J. Deshmukh, G. Fainekos, B. Hoxha, D. Prokhorov, T. Yamaguchi, Risk-Awareness in Learning Neural Controllers for Temporal Logic Objectives, Proc. of ACC’23.

[2] N. Hashemi, B. Hoxha, T. Yamaguchi, D. V. Prokhorov, G. Fainekos, J. V. Deshmukh:

A Neurosymbolic Approach to the Verification of Temporal Logic Properties of Learning-enabled Control Systems. Proc. of ICCPS 2023: 98-109

CDC Workshop: Formal Methods and Decision-making in the age of AI

12 of 30

Neuro-symbolic representation of STL

12/30

[1] N. Hashemi, B. Hoxha, T. Yamaguchi, D. V. Prokhorov, G. Fainekos, J. V. Deshmukh: A Neurosymbolic Approach to the Verification of Temporal Logic Properties of Learning-enabled Control Systems. Proc. of ICCPS 2023: 98-109

CDC Workshop: Formal Methods and Decision-making in the age of AI

13 of 30

Neuro-symbolic Safe Controller Synthesis: High-level idea^1,2

13/30

[1] N. Hashemi, X. Qin, J. Deshmukh, G. Fainekos, B. Hoxha, D. Prokhorov, T. Yamaguchi, Risk-Awareness in Learning Neural Controllers for Temporal Logic Objectives, Proc. of ACC’23.

[2] N. Hashemi, B. Hoxha, T. Yamaguchi, D. V. Prokhorov, G. Fainekos, J. V. Deshmukh:

A Neurosymbolic Approach to the Verification of Temporal Logic Properties of Learning-enabled Control Systems. Proc. of ICCPS 2023: 98-109

CDC Workshop: Formal Methods and Decision-making in the age of AI

14 of 30

Results on synthesis for 6D and 3D quadcopter systems¹

14/30

[1] N. Hashemi, B. Hoxha, T. Yamaguchi, D. Prokhorov, G. Fainekos, J. Deshmukh., A Neurosymbolic Approach to the Verification of Temporal Logic Properties of Learning-enabled Control Systems. In ICCPS 2023

CDC Workshop: Formal Methods and Decision-making in the age of AI

15 of 30

Meditation on results

Scalability of controller verification:

Synthesis: Inherently a highly non-convex optimization problem

Relies on stochastic gradient descent to scale
Convergence proportional to graduate student hours

Higher dimensions, longer time horizons is challenging

Preliminary results using some tricks looks promising

15/30

CDC Workshop: Formal Methods and Decision-making in the age of AI

16 of 30

Better optimization and longer horizons

Challenges:

ReLU-based STL robustness is unpleasant

Idea 1: smooth underapproximations of robustness

Long horizons lead to vanishing/exploding gradients

Idea 2: gradient approximation using ideas like drop-out

Training Results (under submission):

Quadcopter controller to land on a moving platform (task-horizon of 1500 time-steps or 75 seconds) in < 7 minutes
Centralized coordinator for 10 cars navigating in a congested environment in < 2 hours (task-horizon of 60 seconds)

16/30

CDC Workshop: Formal Methods and Decision-making in the age of AI

17 of 30

Verification problem

17/30

CDC Workshop: Formal Methods and Decision-making in the age of AI

18 of 30

Neuro-symbolic Controller Verification (Deterministic)

18/30

[1] N. Hashemi, B. Hoxha,, G. Fainekos, D. Prokhorov, T. Yamaguchi, J. Deshmukh : A Neurosymbolic Approach to the Verification of Temporal Logic Properties of Learning-enabled Control Systems, accepted for publication in the Proc. of ICCPS 2023.

CDC Workshop: Formal Methods and Decision-making in the age of AI

19 of 30

Scalability of verification

Looks promising¹, but suffers with length of trajectory, number of states

19/30

[1] N. Hashemi, B. Hoxha,, G. Fainekos, D. Prokhorov, T. Yamaguchi, J. Deshmukh : A Neurosymbolic Approach to the Verification of Temporal Logic Properties of Learning-enabled Control Systems, accepted for publication in the Proc. of ICCPS 2023.

Property	Time Horizon	Plant NN	Controller NN	Method	Verified?	Runtime (secs)
	100	[3,10,10,10,2]	[2,50,1,2,1,2,1,1]	Exact	Y	1167
	100	[3,10,10,10,2]	[2,50,1,2,1,2,1,1]	Approx	N	35
	50	[4,10,10,3]	[3,100,1,2,1,2,1,1]	Exact	Y	1903
	50	[4,10,10,3]	[3,100,1,2,1,2,1,1]	Approx	N	43
	53	[2,10,10,2]	[5,20,20,20,1]	Exact	Y	260
	53	[2,10,10,2]	[5,20,20,20,1]	Approx	Y	24
	32	[4,8,2]	[2,8,2]	Exact	Y	78
	35			Exact	Y	1955
	35			Exact	Y	2369
	35			Exact	Disproven	1023

CDC Workshop: Formal Methods and Decision-making in the age of AI

20 of 30

Elephant in the room: the NN plant model

20/30

CDC Workshop: Formal Methods and Decision-making in the age of AI

21 of 30

How do we solve these problems?

21/30

CDC Workshop: Formal Methods and Decision-making in the age of AI

22 of 30

RL from Logical Specifications (LTL, STL, MTL)

RL from TL idea spectrum^1-11

Convert satisfaction of a temporal logic property into a 0/1 reward
Convert temporal specifications into MILP
Break specifications into sub-goals and train each sub-goal sequentially
Use reward signals based on temporal logic robustness

Several RL from TL papers use sparse rewards

High training variance, poor convergence
Sub-optimal policies may have zero probability of satisfying specification!

Our approach: robustify the rewards using symbolic potential functions defined over symbolic automata specs

22/30

B. Lacerda, D. Parker, and N. Hawes. Optimal policy generation for partially satisfiable co-safe LTL specifications. IJCAI 2015.
M. Guo and M. Zavlanos. Probabilistic motion planning under temporal tasks and soft constraints. IEEE TAC 2018.
R. Brafman, G. De Giacomo, and F. Patrizi. LTLf/LDLf non-markovian rewards. AAAI 2018.
M. Hasanbeig, A. Abate, and D. Kroening. Logically-Constrained Reinforcement Learning. arXiv 2018.
E. M. Hahn, M Perez, S. Schewe, F. Somenzi, A. Trivedi, and D. Wojtczak. Omega-Regular Objectives in Model-Free Reinforcement Learning. TACAS 2019.
K. Jothimurugan, S. Bansal, O. Bastani, and R. Alur. Compositional Reinforcement Learning from Logical Specifications, NeurIPS 2021.
K. C Kalagarla, R. Jain, and P. Nuzzo. Synthesis of discounted-reward optimal policies for Markov decision processes under linear temporal logic specifications, arXiv 2020.
W. Liu, N. Mehdipour, and C. Belta, “Recurrent Neural Network Controllers for Signal Temporal Logic Specifications Subject to Safety Constraints,” IEEE Control Systems Letters, vol. 6, pp. 91–96, 2022, doi: 10.1109/LCSYS.2021.3049917.
C. Sun, X. Li, and C. Belta, “Automata Guided Semi-Decentralized Multi-Agent Reinforcement Learning,” in 2020 American Control Conference (ACC), Jul. 2020, pp. 3900–3905. doi: 10.23919/ACC45564.2020.9147704.
X. Li, Z. Serlin, G. Yang, and C. Belta, “A formal methods approach to interpretable reinforcement learning for robotic planning,” Science Robotics, vol. 4, no. 37, Dec. 2019, doi: 10.1126/scirobotics.aay6276.
X. Li, Y. Ma, and C. Belta, “Automata-Guided Hierarchical Reinforcement Learning for Skill Composition,” arXiv:1711.00129 [cs], May 2018, Accessed: Dec. 12, 2021. [Online].

…

CDC Workshop: Formal Methods and Decision-making in the age of AI

23 of 30

STL’s easier to digest cousin: Symbolic Automata^1,2

23/30

[1] Anand Balakrishnan, Stefan Jakšić, Edgar A. Aguilar, Dejan Ničković, Jyotirmoy V. Deshmukh, Model-Free Reinforcement Learning for Symbolic Automata-encoded Objectives, to appear in the Proc. of CDC, 2023

[2] Jakšić, Stefan, Ezio Bartocci, Radu Grosu, Thang Nguyen, and Dejan Ničković. "Quantitative monitoring of STL with edit distance." Formal methods in system design 53 (2018): 83-112.

CDC Workshop: Formal Methods and Decision-making in the age of AI

24 of 30

Reward shaping with symbolic automaton objectives

Main ideas¹:

Quantify how far a state in the product MDP is from an accept state
Define a symbolic potential function for each state

Like the A* heuristic, but defined symbolically over the spec automaton

Define rewards that incentivize the agent to move to a state with lower potential

Theorem [1]:

Optimal policy found by an RL algorithm w.r.t. symbolic potential-based reward functions maximizes the probability of satisfying the automaton objective^*

Shameless plug 2: Talk on Friday (FrA23.2)

* The automaton must have a terminally accepting state, e.g., any finitely bounded horizon STL specification

24/30

[1] Anand Balakrishnan, Stefan Jakšić, Edgar A. Aguilar, Dejan Ničković, Jyotirmoy V. Deshmukh, Model-Free Reinforcement Learning for Symbolic Automata-encoded Objectives, CDC, 2023

CDC Workshop: Formal Methods and Decision-making in the age of AI

25 of 30

LfD + STL

25/30

STL specs + demos allows generalization over continuous and stochastic spaces while reducing sample complexity and automatically inferring rewards

A. G. Puranic, J. V. Deshmukh, and S. Nikolaidis. Learning from demonstrations using signal temporal logic. Conference on Robot Learning. (Oral Presentation at CORL 2021) PMLR, 2021.

A. G. Puranic, J. V. Deshmukh, and S. Nikolaidis, (2021). Learning from demonstrations using signal temporal logic in stochastic and continuous domains. IEEE Robotics and Automation Letters, 6(4), 6250-6257. (Oral Presentation at IROS 202)

A. G. Puranic , J. V. Deshmukh, and S. Nikolaidis. "Learning Performance Graphs From Demonstrations via Task-Based Evaluations." IEEE Robotics and Automation Letters 8, no. 1 (2022): 336-343. (To be presented at ICRA 2023)

CDC Workshop: Formal Methods and Decision-making in the age of AI

26 of 30

Meditation on results

Scalability:

RL scales to symbolic automata (> 100s of states)
LfD has much lower sample complexity due to rich structure provided by specs

Guarantees:

Depend on optimality/convergence of underlying RL procedure
If we learn an optimal policy, then it maximizes probability of satisfying specs

Is the learned policy optimal? Needs verification

26/30

CDC Workshop: Formal Methods and Decision-making in the age of AI

27 of 30

Ushering a neuro-symbolic era for safe autonomy?

This talk: just a sample of some initial efforts; lot more needs to be done
Open problems

Scaling: system-wide safety assurance is still quite a challenge
Expressivity: are existing spatio-temporal logics expressive/interesting enough?
Robust perception: how do we encode epistemic uncertainty about data?
The Sim2Real Gap
…

What does logic buy us?

27/30

CDC Workshop: Formal Methods and Decision-making in the age of AI

28 of 30

Confluence of logic and learning: will it be a happily ever after?

Logic-based methods can improve learning-based techniques

By reducing the learning effort (by pruning sample space, providing structure)
Improving explainability
Providing pathway to assurance

Learning-based methods can assist in solving problems formulated using logic

Providing probabilistic/statistical guarantees (instead of absolute guarantees)
Increasing scalability of verification
Expanding applicability of logic-based methods to previously inapplicable domains (model-free controller synthesis)

28/30

CDC Workshop: Formal Methods and Decision-making in the age of AI

29 of 30

Credits

USC Ph.D. students

Navid Hashemi
Anand Balakrishnan
Aniruddh Puranic

Colleagues

Lars Lindemann, Stefanos Nikolaidis (USC)
Georgios Fainekos, Bardh Hoxha (Toyota),
Dejan Nickovic, Mateis Cristinel, Edgar Aguilar^* (AIT)

Sponsors

NSF (Career, FMitF, SHF)
USC Center for Autonomy and AI
Toyota R&D, Airbus Institute for Engineering Research

CDC Workshop: Formal Methods and Decision-making in the age of AI

30 of 30

Thank you!

30/30

If a person is detected with a confidence score greater than or equal to c1 in a particular frame, then in the next 4 frames, the probability of detecting the same person should never drop below c2.

Some post-credits scenes

CDC Workshop: Formal Methods and Decision-making in the age of AI