1 of 30

Logical Specifications meet�Learning-based Control

Jyo Deshmukh,

Thomas Lord Dept. of CS, University of Southern California (USC)

CDC Workshop: Formal Methods and Decision-making in the age of AI

2 of 30

Paradigm shift in design of autonomous systems

2/30

Classical approach: Model-based, Well-defined uncertainty models

Brave new world:

Highly uncertain environments, do not easily lend to modeling, data-driven and model-free

CDC Workshop: Formal Methods and Decision-making in the age of AI

3 of 30

Safe control problem

  •  

3/30

CDC Workshop: Formal Methods and Decision-making in the age of AI

4 of 30

How do we solve these problems?

  •  

4/30

CDC Workshop: Formal Methods and Decision-making in the age of AI

5 of 30

How do we solve these problems?

  •  

5/30

CDC Workshop: Formal Methods and Decision-making in the age of AI

6 of 30

Signal Temporal Logic1,2,3

6/30

0

100

50

1

3

Always between time 0 and 100

 

 

Eventually at some time t between time 20 and 60

From that time t, always till the end of the signal trace

 

0

100

1

-0.1

+0.1

60

 

 

[1] Maler, Oded, and Dejan Nickovic. "Monitoring temporal properties of continuous signals." In Formal Techniques, Modelling and Analysis of Timed and Fault-Tolerant Systems, pp. 152-166, 2004.

[2] Donzé, Alexandre, and Oded Maler. "Robust satisfaction of temporal logic over real-valued signals." International Conference on Formal Modeling and Analysis of Timed Systems, 2010.

[3] Georgios E. Fainekos, George J. Pappas: Robustness of temporal logic specifications for continuous-time signals. Theor. Comput. Sci. 410(42): 4262-4291 (2009)

CDC Workshop: Formal Methods and Decision-making in the age of AI

7 of 30

Specifying autonomous driving constraints and specifications

  •  

7/30

CDC Workshop: Formal Methods and Decision-making in the age of AI

8 of 30

STL : Quantitative semantics

8/30

GOOD

GOOD

 

How far is

bad?

0

100

50

3

BAD

 

 

0

100

50

3

BAD

 

 

How bad is the violation?

CDC Workshop: Formal Methods and Decision-making in the age of AI

9 of 30

Quantitative Semantics of STL or “Robustness” value

  •  

9/30

CDC Workshop: Formal Methods and Decision-making in the age of AI

10 of 30

How do we solve these problems?

  •  

10/30

CDC Workshop: Formal Methods and Decision-making in the age of AI

11 of 30

Neuro-symbolic Safe Controller Synthesis: High-level idea1,2

  •  

11/30

 

 

 

 

 

 

 

 

 

 

 

 

 

 

[1] N. Hashemi, X. Qin, J. Deshmukh, G. Fainekos, B. Hoxha, D. Prokhorov, T. Yamaguchi, Risk-Awareness in Learning Neural Controllers for Temporal Logic Objectives, Proc. of ACC’23.

[2] N. Hashemi, B. Hoxha, T. Yamaguchi, D. V. Prokhorov, G. Fainekos, J. V. Deshmukh:

A Neurosymbolic Approach to the Verification of Temporal Logic Properties of Learning-enabled Control Systems. Proc. of ICCPS 2023: 98-109

CDC Workshop: Formal Methods and Decision-making in the age of AI

12 of 30

Neuro-symbolic representation of STL

12/30

 

[1] N. Hashemi, B. Hoxha, T. Yamaguchi, D. V. Prokhorov, G. Fainekos, J. V. Deshmukh: A Neurosymbolic Approach to the Verification of Temporal Logic Properties of Learning-enabled Control Systems. Proc. of ICCPS 2023: 98-109

CDC Workshop: Formal Methods and Decision-making in the age of AI

13 of 30

Neuro-symbolic Safe Controller Synthesis: High-level idea1,2

  •  

13/30

 

 

 

 

 

 

 

 

 

 

 

 

 

 

[1] N. Hashemi, X. Qin, J. Deshmukh, G. Fainekos, B. Hoxha, D. Prokhorov, T. Yamaguchi, Risk-Awareness in Learning Neural Controllers for Temporal Logic Objectives, Proc. of ACC’23.

[2] N. Hashemi, B. Hoxha, T. Yamaguchi, D. V. Prokhorov, G. Fainekos, J. V. Deshmukh:

A Neurosymbolic Approach to the Verification of Temporal Logic Properties of Learning-enabled Control Systems. Proc. of ICCPS 2023: 98-109

CDC Workshop: Formal Methods and Decision-making in the age of AI

14 of 30

Results on synthesis for 6D and 3D quadcopter systems1

  •  

14/30

[1] N. Hashemi, B. Hoxha, T. Yamaguchi, D. Prokhorov, G. Fainekos, J. Deshmukh., A Neurosymbolic Approach to the Verification of Temporal Logic Properties of Learning-enabled Control Systems. In ICCPS 2023

 

CDC Workshop: Formal Methods and Decision-making in the age of AI

15 of 30

Meditation on results

Scalability of controller verification:

  • Synthesis: Inherently a highly non-convex optimization problem
    • Relies on stochastic gradient descent to scale
    • Convergence proportional to graduate student hours
  • Higher dimensions, longer time horizons is challenging
    • Preliminary results using some tricks looks promising

15/30

CDC Workshop: Formal Methods and Decision-making in the age of AI

16 of 30

Better optimization and longer horizons

Challenges:

  • ReLU-based STL robustness is unpleasant
    • Idea 1: smooth underapproximations of robustness
  • Long horizons lead to vanishing/exploding gradients
    • Idea 2: gradient approximation using ideas like drop-out

Training Results (under submission):

  • Quadcopter controller to land on a moving platform (task-horizon of 1500 time-steps or 75 seconds) in < 7 minutes
  • Centralized coordinator for 10 cars navigating in a congested environment in < 2 hours (task-horizon of 60 seconds)

16/30

CDC Workshop: Formal Methods and Decision-making in the age of AI

17 of 30

Verification problem

  •  

17/30

CDC Workshop: Formal Methods and Decision-making in the age of AI

18 of 30

Neuro-symbolic Controller Verification (Deterministic)

  •  

18/30

 

 

 

 

 

 

 

 

 

 

 

 

 

[1] N. Hashemi, B. Hoxha,, G. Fainekos, D. Prokhorov, T. Yamaguchi, J. Deshmukh : A Neurosymbolic Approach to the Verification of Temporal Logic Properties of Learning-enabled Control Systems, accepted for publication in the Proc. of ICCPS 2023.

 

CDC Workshop: Formal Methods and Decision-making in the age of AI

19 of 30

Scalability of verification

  • Looks promising1, but suffers with length of trajectory, number of states

19/30

[1] N. Hashemi, B. Hoxha,, G. Fainekos, D. Prokhorov, T. Yamaguchi, J. Deshmukh : A Neurosymbolic Approach to the Verification of Temporal Logic Properties of Learning-enabled Control Systems, accepted for publication in the Proc. of ICCPS 2023.

Property

Time Horizon

Plant NN

Controller NN

Method

Verified?

Runtime (secs)

100

[3,10,10,10,2]

[2,50,1,2,1,2,1,1]

Exact

Y

1167

Approx

N

35

50

[4,10,10,3]

[3,100,1,2,1,2,1,1]

Exact

Y

1903

Approx

N

43

53

[2,10,10,2]

[5,20,20,20,1]

Exact

Y

260

Approx

Y

24

32

[4,8,2]

[2,8,2]

Exact

Y

78

35

Exact

Y

1955

35

Exact

Y

2369

35

Exact

Disproven

1023

CDC Workshop: Formal Methods and Decision-making in the age of AI

20 of 30

Elephant in the room: the NN plant model

  •  

20/30

CDC Workshop: Formal Methods and Decision-making in the age of AI

21 of 30

How do we solve these problems?

  •  

21/30

CDC Workshop: Formal Methods and Decision-making in the age of AI

22 of 30

RL from Logical Specifications (LTL, STL, MTL)

  • RL from TL idea spectrum1-11
    • Convert satisfaction of a temporal logic property into a 0/1 reward
    • Convert temporal specifications into MILP
    • Break specifications into sub-goals and train each sub-goal sequentially
    • Use reward signals based on temporal logic robustness
  • Several RL from TL papers use sparse rewards
    • High training variance, poor convergence
    • Sub-optimal policies may have zero probability of satisfying specification!
  • Our approach: robustify the rewards using symbolic potential functions defined over symbolic automata specs

22/30

  1. B. Lacerda, D. Parker, and N. Hawes. Optimal policy generation for partially satisfiable co-safe LTL specifications. IJCAI 2015.
  2. M. Guo and M. Zavlanos. Probabilistic motion planning under temporal tasks and soft constraints. IEEE TAC 2018.
  3. R. Brafman, G. De Giacomo, and F. Patrizi. LTLf/LDLf non-markovian rewards. AAAI 2018.
  4. M. Hasanbeig, A. Abate, and D. Kroening. Logically-Constrained Reinforcement Learning. arXiv 2018.
  5. E. M. Hahn, M Perez, S. Schewe, F. Somenzi, A. Trivedi, and D. Wojtczak. Omega-Regular Objectives in Model-Free Reinforcement Learning. TACAS 2019.
  6. K. Jothimurugan, S. Bansal, O. Bastani, and R. Alur. Compositional Reinforcement Learning from Logical Specifications, NeurIPS 2021.
  7. K. C Kalagarla, R. Jain, and P. Nuzzo. Synthesis of discounted-reward optimal policies for Markov decision processes under linear temporal logic specifications, arXiv 2020.
  8. W. Liu, N. Mehdipour, and C. Belta, “Recurrent Neural Network Controllers for Signal Temporal Logic Specifications Subject to Safety Constraints,” IEEE Control Systems Letters, vol. 6, pp. 91–96, 2022, doi: 10.1109/LCSYS.2021.3049917. 
  9. C. Sun, X. Li, and C. Belta, “Automata Guided Semi-Decentralized Multi-Agent Reinforcement Learning,” in 2020 American Control Conference (ACC), Jul. 2020, pp. 3900–3905. doi: 10.23919/ACC45564.2020.9147704.
  10. X. Li, Z. Serlin, G. Yang, and C. Belta, “A formal methods approach to interpretable reinforcement learning for robotic planning,” Science Robotics, vol. 4, no. 37, Dec. 2019, doi: 10.1126/scirobotics.aay6276.  
  11. X. Li, Y. Ma, and C. Belta, “Automata-Guided Hierarchical Reinforcement Learning for Skill Composition,” arXiv:1711.00129 [cs], May 2018, Accessed: Dec. 12, 2021. [Online].

CDC Workshop: Formal Methods and Decision-making in the age of AI

23 of 30

STL’s easier to digest cousin: Symbolic Automata1,2

  •  

23/30

[1] Anand Balakrishnan, Stefan Jakšić, Edgar A. Aguilar, Dejan Ničković, Jyotirmoy V. Deshmukh, Model-Free Reinforcement Learning for Symbolic Automata-encoded Objectives, to appear in the Proc. of CDC, 2023

[2] Jakšić, Stefan, Ezio Bartocci, Radu Grosu, Thang Nguyen, and Dejan Ničković. "Quantitative monitoring of STL with edit distance." Formal methods in system design 53 (2018): 83-112.

CDC Workshop: Formal Methods and Decision-making in the age of AI

24 of 30

Reward shaping with symbolic automaton objectives

Main ideas1:

    • Quantify how far a state in the product MDP is from an accept state
    • Define a symbolic potential function for each state
      • Like the A* heuristic, but defined symbolically over the spec automaton
    • Define rewards that incentivize the agent to move to a state with lower potential

Theorem [1]:

Optimal policy found by an RL algorithm w.r.t. symbolic potential-based reward functions maximizes the probability of satisfying the automaton objective*

Shameless plug 2: Talk on Friday (FrA23.2)

* The automaton must have a terminally accepting state, e.g., any finitely bounded horizon STL specification

24/30

[1] Anand Balakrishnan, Stefan Jakšić, Edgar A. Aguilar, Dejan Ničković, Jyotirmoy V. Deshmukh, Model-Free Reinforcement Learning for Symbolic Automata-encoded Objectives, CDC, 2023

CDC Workshop: Formal Methods and Decision-making in the age of AI

25 of 30

LfD + STL

25/30

STL specs + demos allows generalization over continuous and stochastic spaces while reducing sample complexity and automatically inferring rewards

A. G. Puranic, J. V. Deshmukh, and S. Nikolaidis. Learning from demonstrations using signal temporal logic. Conference on Robot Learning. (Oral Presentation at CORL 2021) PMLR, 2021.

A. G. Puranic, J. V. Deshmukh, and S. Nikolaidis, (2021). Learning from demonstrations using signal temporal logic in stochastic and continuous domains. IEEE Robotics and Automation Letters, 6(4), 6250-6257. (Oral Presentation at IROS 202)

A. G. Puranic , J. V. Deshmukh, and S. Nikolaidis. "Learning Performance Graphs From Demonstrations via Task-Based Evaluations." IEEE Robotics and Automation Letters 8, no. 1 (2022): 336-343. (To be presented at ICRA 2023)

CDC Workshop: Formal Methods and Decision-making in the age of AI

26 of 30

Meditation on results

  • Scalability:
    • RL scales to symbolic automata (> 100s of states)
    • LfD has much lower sample complexity due to rich structure provided by specs
  • Guarantees:
    • Depend on optimality/convergence of underlying RL procedure
    • If we learn an optimal policy, then it maximizes probability of satisfying specs
      • Is the learned policy optimal? Needs verification

26/30

CDC Workshop: Formal Methods and Decision-making in the age of AI

27 of 30

Ushering a neuro-symbolic era for safe autonomy?

  • This talk: just a sample of some initial efforts; lot more needs to be done
  • Open problems
    • Scaling: system-wide safety assurance is still quite a challenge
    • Expressivity: are existing spatio-temporal logics expressive/interesting enough?
    • Robust perception: how do we encode epistemic uncertainty about data?
    • The Sim2Real Gap
  • What does logic buy us?

27/30

CDC Workshop: Formal Methods and Decision-making in the age of AI

28 of 30

Confluence of logic and learning: will it be a happily ever after?

  • Logic-based methods can improve learning-based techniques
    • By reducing the learning effort (by pruning sample space, providing structure)
    • Improving explainability
    • Providing pathway to assurance
  • Learning-based methods can assist in solving problems formulated using logic
    • Providing probabilistic/statistical guarantees (instead of absolute guarantees)
    • Increasing scalability of verification
    • Expanding applicability of logic-based methods to previously inapplicable domains (model-free controller synthesis)

28/30

CDC Workshop: Formal Methods and Decision-making in the age of AI

29 of 30

Credits

  • USC Ph.D. students
    • Navid Hashemi
    • Anand Balakrishnan
    • Aniruddh Puranic
  • Colleagues
    • Lars Lindemann, Stefanos Nikolaidis (USC)
    • Georgios Fainekos, Bardh Hoxha (Toyota),
    • Dejan Nickovic, Mateis Cristinel, Edgar Aguilar* (AIT)
  • Sponsors
    • NSF (Career, FMitF, SHF)
    • USC Center for Autonomy and AI
    • Toyota R&D, Airbus Institute for Engineering Research

CDC Workshop: Formal Methods and Decision-making in the age of AI

30 of 30

Thank you!

  •  

30/30

If a person is detected with a confidence score greater than or equal to c1 in a particular frame, then in the next 4 frames, the probability of detecting the same person should never drop below c2.

Some post-credits scenes

CDC Workshop: Formal Methods and Decision-making in the age of AI