1 of 41

An Efficient Framework for

Modular Autonomous Vehicle Risk Assessment (MAVRA)

Robert Moss

Stanford University

(collaborating with Shubh Gupta, Marc R. Schlichting, Kyu-Young Kim, Anthony Corso, Grace X. Gao, and Mykel J. Kochenderfer)

ITSC Workshop on Safety Validation of Connected and Automated Vehicles

October 7, 2022

2 of 41

Motivation: Real-world setting

2

Problem:

Estimate risk of autonomous vehicle policies in realistic environments.

3 of 41

Motivation: Realistic simulation

3

Problem:

Estimate risk of autonomous vehicle policies in high-fidelity simulators.

4 of 41

Motivation: Efficient assessment

4

Problem:

Efficiently estimate risk of autonomous vehicle policies in high-fidelity simulators.

(faster than real-time simulation)

5 of 41

Motivation

5

“Too often we see solid tool chains but no tangible test strategies.”

  • Christof Ebert and Michael Weyrich. "Validation of Autonomous Systems." IEEE Software, 2019.

6 of 41

Objective

  • Develop strategy for autonomous vehicle risk assessment as a modular framework:
    • Modular simulated components
      • Allow for different levels of model fidelity based on availability in simulation
    • Scenario-based validation
      • Condition on stressing cases
    • Efficient algorithms for scenario search and falsification
      • Plug in different search algorithms based on computational availability
    • Well-defined cost metric
      • Estimate the cost of a failure given limitations of the simulator
    • Unified risk metrics
      • Calculate risk to compare against other AV policies (agnostic of chosen cost metric)

6

7 of 41

Modular framework: Risk assessment (MAVRA)

7

8 of 41

Modular framework: Risk assessment (MAVRA)

8

9 of 41

Modular framework: Risk assessment (MAVRA)

9

10 of 41

Modular framework: Risk assessment (MAVRA)

10

11 of 41

Modular framework: Risk assessment (MAVRA)

11

12 of 41

Modular framework: Risk assessment (MAVRA)

12

13 of 41

Modular framework: Risk assessment (MAVRA)

13

14 of 41

Modular framework: Risk assessment (MAVRA)

14

15 of 41

Modular framework: Risk assessment (MAVRA)

15

16 of 41

Modular framework: Risk assessment (MAVRA)

16

17 of 41

Modular framework: Risk assessment (MAVRA)

17

18 of 41

Modular framework: Risk assessment (MAVRA)

18

19 of 41

Scenario-based assessment

2D low-fidelity case

  • Perform scenario-based testing, or “use-case” testing.
    • Used in government agencies and industry[1,2,3]
  • Initial work[4] demonstrated the framework on a 2D driving simulator: AutomotiveSimulator.jl[5]
    • We investigate stressing scenarios inspired by the U.S. NHTSA pre-crash typology[6]
    • Ran across all scenarios
      • Due to inexpensive computation in 2D

19

[1] Z. Zhong, et al. “A Survey on Scenario-Based Testing for Automated Driving Systems in High-Fidelity Simulation.” arXiv:2112.00964, 2021.

[2] P. Junietz, et al. "Evaluation of Different Approaches to Address Safety Validation of Automated Driving." IEEE ITSC, 2018.

[3] C. Ebert and M. Weyrich. “Validation of Autonomous Systems”, IEEE Software, 2019.

[4] R. Moss, et al. “Autonomous Vehicle Risk Assessment.” Tech. Report, Stanford Center for AI Safety, 2021.

[5] https://github.com/sisl/AutomotiveSimulator.jl

[6] W. G. Najm, J. D. Smith, and M. Yanagisawa. "Pre-crash Scenario Typology for Crash Avoidance Research." No. DOT-VNTSC-NHTSA-06-02. United States National Highway Traffic Safety Administration, 2007.

20 of 41

Scenario-based assessment

3D high-fidelity case

  • Modularity in framework allows for different driving simulators (demonstrated on the open-source CARLA sim.[1])
    • Using the scenario runner[2] to sample NHTSA pre-crash typology[3] inspired scenarios and weather parameters.
    • Scaling to high-fidelity 3D simulators, intelligent sampling is required (instead of exhaustive search)

20

Image credits: https://carlachallenge.org/challenge/nhtsa/

[1] https://carla.org/

[2] https://github.com/carla-simulator/scenario_runner

[3] W. G. Najm, J. D. Smith, and M. Yanagisawa. "Pre-crash Scenario Typology for Crash Avoidance Research." No. DOT-VNTSC-NHTSA-06-02. United States National Highway Traffic Safety Administration, 2007.

21 of 41

Scenario-based assessment

Weather models

  • Weather parameters can also sample as part of the “scenario”
    • Using models of weather characteristics from different geographical locations

21

22 of 41

Scenario-based assessment

AV-specific stressing scenarios (see [1])

22

[1] F. M. Favarò, et al. “Examining Accident Reports Involving Autonomous Vehicles in California.” PLOS One, 2017.

[1]

23 of 41

AV policies

  • Previous work[1] used the intelligent driver model (IDM) in a 2D simulated environment
  • Current work focuses on image-based neural network policies in the high-fidelity simulator CARLA[2]
    • We use open-source policies that competed in the CARLA Autonomous Driving Challenge
    • Namely, World-on-Rails[3] and the NEAT[4] policies.

23

Video credit: https://github.com/autonomousvision/neat

[1] R. Moss, et al. “Autonomous Vehicle Risk Assessment.” Tech. Report, Stanford Center for AI Safety, 2021.

[2] https://carla.org/

[2] D. Chen, V. Koltun, and P. Krähenbühl. "Learning to Drive from a World on Rails", International Conference on Computer Vision (ICCV), 2021.

[3] K. Chitta, A. Prakash, and A. Geiger. "NEAT: Neural Attention Fields for End-to-End Autonomous Driving", International Conference on Computer Vision (ICCV), 2021.

[4]

24 of 41

Observation models

Sensors

  • The sensor models are dependent on choice of AV policies
    • World-on-Rails[1] and the NEAT[2] policies use multiple cameras and GNSS for localization
  • We apply adversarial noise disturbances to the sensor outputs (e.g., camera saturation and static noise)

24

[1] D. Chen, V. Koltun, and P. Krähenbühl. "Learning to Drive from a World on Rails", International Conference on Computer Vision (ICCV), 2021.

[2] K. Chitta, A. Prakash, and A. Geiger. "NEAT: Neural Attention Fields for End-to-End Autonomous Driving", International Conference on Computer Vision (ICCV), 2021.

25 of 41

Failure metric

  • We define failure as a collision with another agent or object.
    • Failure could be re-defined as lane violations, hard braking, swerving events, etc.

25

26 of 41

Cost metric

For collision failures

  • Impact speed: Useful proxy for collision severity[1]
    • Strictly non-negative
    • Often easy to obtain in low- and high-fidelity simulators
  • delta-V (δv): Industry often uses delta-V
    • Defined as difference in speed between time-of-collision and the subsequent time step
    • Shown to be a good measure of crash severity[1] used as a proxy for impact force (when recreating crashes)
    • Potential (unforeseen) issues with negative delta-V (e.g., vehicle speeds into pedestrian)
  • Collision force: Normal impulse (i.e., change in momentum) of the collision
    • The value we are trying to approximate using the metrics above
    • Recommended to use if available in simulation (often requires more sophisticated simulated physics engines)
  • Parameterized cost model: Combines multiple inputs
    • Could include severity of impact, involved participants (other vehicles or pedestrians), environment surroundings (urban vs. rural).
    • Hierarchical cost that outputs a combined measure based on fatality/fatality likelihood, bodily injury expenses, and property damage.

26

[1] D. C. Richards, "Relationship between Speed and Risk of Fatal Injury: Pedestrians and Car Occupants", Department for Transport: London, 2010.

Objective: cost of zero should equal “no failure” using what’s available in simulation

27 of 41

Risk metric

  • Adopting risk metrics used in the financial industry and robotics community[1]
  • Using a distribution of costs Z (where z > 0 indicates “failure”)

  • Mean cost:
  • Value at risk (VaR):
  • Conditional value at risk (CVaR):
  • Worst case cost:

  • Using conditional value at risk (CVaR) provides a well-defined risk metric to compute relative risk b/w AV policies

27

[1] A. Majumdar and M. Pavone. "How should a robot assess risk? Towards an axiomatic theory of risk in robotics." Robotics Research, 2020.

28 of 41

Validation Search

Scenario-level search / sampling

  • With a fast simulator and/or enough parallel compute:
    • Perform exhaustive or Monte Carlo search over scenarios
  • With limited compute and expensive simulators:
    • A more intelligent scenario sampling scheme is required
      • We estimate risk using tree-based importance sampling
        • Using ongoing work from the Stanford NAV Lab[1]
      • Idea: focus on important areas that impact the CVaR estimate

28

[1] S. Gupta, et al. "Tree-based importance sampling for risk estimation." Work-in-Progress, 2022.

29 of 41

Validation Search

Step-level search (episode)

  • Given a scenario, apply sensor noise disturbances at each time step when running the scenario.

29

t = 1

30 of 41

Validation Search

Step-level search (episode)

  • Given a scenario, apply sensor noise disturbances at each time step when running the scenario.

30

t = 1

t = 2

31 of 41

Validation Search

Step-level search (episode)

  • Given a scenario, apply sensor noise disturbances at each time step when running the scenario.

31

t = 1

t = 2

t = 3

32 of 41

Validation Search

Step-level search (episode)

  • Given a scenario, apply sensor noise disturbances at each time step when running the scenario.

32

t = 1

t = 2

t = 3

t = 4

33 of 41

Validation Search

Step-level search (episode)

  • Given a scenario, apply sensor noise disturbances at each time step when running the scenario.

33

t = 1

t = 2

t = 3

t = 4

t = 5

34 of 41

Validation Search

Step-level search (episode)

  • Given a scenario, apply sensor noise disturbances at each time step when running the scenario.

34

[1] S. Gupta, et al. "Tree-based importance sampling for risk estimation." Work-in-Progress, 2022.

bicyclist

35 of 41

Validation Search

Step-level search (episode)

  • Existing algorithms can be used for the step-level falsification (i.e., adversarial sensor noise disturbances):
    • Monte Carlo noise
    • Adaptive stress testing[1]
    • Cross-entropy method[2]
    • Rapidly-exploring random trees (RRTs)[3]
    • (see other black-box falsification methods in the survey by A. Corso, et al.[4])

35

[1] R. Lee, et al. "Adaptive Stress Testing: Finding Likely Failure Events with Reinforcement Learning," Journal of Artificial Intelligence Research, 2020.

[2] R. Y. Rubinstein and D. P. Kroese. "The Cross-Entropy Method: A Unified Approach to Combinatorial Optimization, Monte Carlo Simulation and Machine Learning." Springer Science and Business Media, 2013.

[3] S. M. Lavalle. "Rapidly-Exploring Random Trees: A New Tool for Path Planning." Iowa State University, Tech. Report, 1998.

[4] A. Corso, et al. “A Survey of Algorithms for Black-Box Safety Validation of Cyber-Physical Systems”, Journal of Artificial Intelligence Research, 2021.

t = 1

t = 2

t = 3

t = 4

t = 5

36 of 41

Preliminary results

Tree-IS in 2D case

  • In the 2D driving simulator, S. Gupta et al.[1] showed the tree importance sampling case converges (among other examples)

36

[1] S. Gupta, et al. "Tree-based importance sampling for risk estimation." Work-in-Progress, 2022.

37 of 41

Preliminary results

Tree-IS in 3D CARLA case

  • In the 3D driving simulator, tree-IS for scenario selection has small benefit in risk metric convergence

  • (Note: showing NEAT agent when using Monte Carlo scenario selection vs. tree importance sampling)

37

38 of 41

Preliminary results

World-on-Rails vs. NEAT (relative risk)

  • Now comparing risk given different AV policies (World-on-Rails and NEAT)

38

39 of 41

Example cases

CARLA

39

Non-failure

Failure

40 of 41

Recap and conclusions

  • Developed unified strategy for autonomous vehicle risk assessment and validation (MAVRA)
    • Modular by design to adapt to in-simulation restrictions
  • Proposed the use of risk metrics from the financial and robotics communities
    • Use of conditional value at risk (CVaR) to estimate relative tail-risk of AV policies
  • Plug-in-play different intelligent scenario search and step-level falsification methods
    • Can research and develop better scenario search and falsification methods on low-fidelity, then upscale to high-fidelity
    • Could be used as feedback for training AV policies (minimize risk)
  • Reminder that this approach is one aspect of autonomous vehicle safety assessment (see Transferring Aviation Safety Lessons to the Road[1])
  • Framework is open-sourced at https://github.com/sisl/AutonomousRiskFramework.jl
  • Submitting journal article to IEEE Transactions on Intelligent Transportation Systems

40

[1] M. J. Kochenderfer and R. J. Moss, “Transferring Aviation Safety Lessons to the Road.” Automated Road Transportation Symposium, 2021. [http://web.stanford.edu/~mossr/pdf/kochenderfer-arts21.pdf]

41 of 41

Thank you!

Questions?

mossr@cs.stanford.edu

Robert Moss