1 of 14

모션계획을 위한 강화학습 기반 트리 편향 확장 기술Bias tree expansion using reinforcement learning �for efficient motion planning

윤민성1, 박대형1, 윤성의1

1한국과학기술원(KAIST), 전산학부

2 of 14

Motion Planning

Motion Planning (MP) is a computational problem to find a sequence of valid configurations that moves the object from the source to destination.

3 of 14

Background

Sampling-based Motion Planning basically expands the tree structure using random sampling.

: Randomly sampled node

: start configuration

: goal configuration

: Found path

Tree expansion

4 of 14

Background

Recently, a network learned from near optimal data began to be used as a bias for expansion of the tree.

: randomly sampled node

: node generated by network

: start configuration

: goal configuration

Tree expansion

5 of 14

Related work�

Motion Planning Network (T-RO 2020)

6 of 14

Motivation

  • Behavior cloning (Supervised learning)
  • Limitations:
    • (-) Distribution mismatch (unseen state)
    • (-) Bounded Performance to the demonstration
    • (-) Tend to be overfitted to the demonstration

Stanford CS234: Reinforcement Learning Winter 2020

7 of 14

RL-RRT*

  • Train a network in the RL framework
  • After training RL agent, it is used as a bias module in the process of expanding the tree structure of RRT*.

RL-RRT* expansion progress

: randomly sampled node

: node generated by network

8 of 14

RL-RRT*

  • Architecture

9 of 14

RL-RRT*

  • Markov Decision Process (MDP):�������
  • RL algorithm: Soft-Actor Critic

10 of 14

Experiment setting

  • Planners for experiments:

[1] RRT* :random sampling 100%

[2] MPNet :random sampling 50% + network bias 50%

[3] RL_RRT* :random sampling 50% + network bias 50%

(Ours)

  • Evaluation map: 50 maps (2d environment)

  • Holonomic mobile robot.

11 of 14

Training efficiency

  • Data efficiency:
    • MPNet (supervised learning) � : 24,746,538 steps data. (24.7M)
    • RL_RRT* (reinforcement learning)� : 1,000,000 steps action. (1M)�
  • Time to get the trained network:
    • MPNet (supervised learning)� : 5 days for collecting demonstration + 7 hours for training.
    • RL_RRT* (reinforcement learning) � : 7.5 hours.

12 of 14

Experiment Result - 2D

  • Eval 50 maps with 100 random start & goal at each map
  • Run each algorithm for 3 seconds

13 of 14

Network bias quality

  • Map #391 (in the evaluation set)

trained with

Supervised Learning (SL)

: Goal configuration

trained with

Reinforcement Learning (RL)

: Network bias

MPNet

RL_RRT*

14 of 14

Thanks for listening.

FP3-2-13

Minsung Yoon

Sung-Eui Yoon

Daehyung Park