1 of 14

모션계획을 위한 강화학습 기반 트리 편향 확장 기술�Bias tree expansion using reinforcement learning �for efficient motion planning�

윤민성¹, 박대형¹, 윤성의¹

¹한국과학기술원(KAIST), 전산학부

2 of 14

Motion Planning

Motion Planning (MP) is a computational problem to find a sequence of valid configurations that moves the object from the source to destination.

3 of 14

Background

Sampling-based Motion Planning basically expands the tree structure using random sampling.

: Randomly sampled node

: start configuration

: goal configuration

: Found path

Tree expansion

4 of 14

Background

Recently, a network learned from near optimal data began to be used as a bias for expansion of the tree.

: randomly sampled node

: node generated by network

: start configuration

: goal configuration

Tree expansion

5 of 14

Related work�

Motion Planning Network (T-RO 2020)

6 of 14

Motivation

Behavior cloning (Supervised learning)
Limitations:

(-) Distribution mismatch (unseen state)
(-) Bounded Performance to the demonstration
(-) Tend to be overfitted to the demonstration

�

Stanford CS234: Reinforcement Learning Winter 2020�

7 of 14

RL-RRT*

Train a network in the RL framework
After training RL agent, it is used as a bias module in the process of expanding the tree structure of RRT*.

RL-RRT* expansion progress

: randomly sampled node

: node generated by network

8 of 14

RL-RRT*

Architecture

9 of 14

RL-RRT*

Markov Decision Process (MDP):��
RL algorithm: Soft-Actor Critic

10 of 14

Experiment setting

Planners for experiments:

[1] RRT* :random sampling 100%

[2] MPNet :random sampling 50% + network bias 50%

��[3] RL_RRT* :random sampling 50% + network bias 50%

(Ours)

Evaluation map: 50 maps (2d environment)

Holonomic mobile robot.

11 of 14

Training efficiency

Data efficiency:

MPNet (supervised learning) � : 24,746,538 steps data. (24.7M)
RL_RRT* (reinforcement learning)� : 1,000,000 steps action. (1M)�

Time to get the trained network:

MPNet (supervised learning)� : 5 days for collecting demonstration + 7 hours for training.
RL_RRT* (reinforcement learning) � : 7.5 hours.

12 of 14

Experiment Result - 2D

Eval 50 maps with 100 random start & goal at each map
Run each algorithm for 3 seconds�

13 of 14

Network bias quality

Map #391 (in the evaluation set)

trained with

Supervised Learning (SL)

: Goal configuration

trained with

Reinforcement Learning (RL)

: Network bias

MPNet

RL_RRT*

14 of 14

Thanks for listening.

FP3-2-13

Minsung Yoon

Sung-Eui Yoon

Daehyung Park