1 of 17

Paper review

  • Configurable Crowd Profiles(SIGGRAPH 2022)

Yongwoo Lee

2 of 17

Introduction

  • Configurable Crowd Profiles(CCP) ~ 0:50
    • Each agent has multiple behaviors concurrently
      • Goal seeking, Collision avoidance, Grouping, Interaction
    • Each agent has an individual profile(Variable). -> Heterogenous crowds

3 of 17

Background

  • Major categories of crowd simulation models

Microscopic

4 of 17

Background

  • Major categories of crowd simulation models

Microscopic

Macroscopic

5 of 17

Background

  • Major categories of crowd simulation models

Microscopic

Macroscopic

Mesoscopic

6 of 17

Background

  • Major categories of crowd simulation models

Microscopic

Macroscopic

Mesoscopic

7 of 17

Introduction

  • Problem definition
    • Reach a goal while avoiding collisions
      • Collision with other agents
      • Collision with environments & attractions

    • Heterogeneity
      • Each agent has its own personality.�E.g. Initiative / Careful�
    • Variation of behaviors
      • Group properties
      • Interactions
      • Secondary actions

8 of 17

Introduction

  • The rise of reinforcement learning
    • Observation : position, speed, obstacles info
    • Action : moving direction
    • Reward : Reaching a goal (+) / Collide with obstacles (-)

9 of 17

Introduction

  • Challenge of Deep RL methods
    • Trial & error for desirable result : time consuming
      • Find the best tune for reward terms
      • Hard to mix multiple behaviors in a single policy.
        • Sensitive adjustment and balancing of reward functions

    • Non-flexible : Less controllability
      • Crowds with an uniform behavior
      • If we want to change agent’s strategy or characteristic?�-> training again.

10 of 17

Introduction

  • RL Policy Parameterization : more controllability!
    • Policy parameterization techniques
      • more exploration on state, action, reward space
      • Learn more generalizable and complex tasks
  • Objective : With this technique,
      • We capture multiple behaviors concurrently, �with flexible importance weights of each behavior.
      • Instead of manually defining reward combinations, �we vary the weights of different reward signals during training.

11 of 17

Methods

  • Framework : Configurable Crowd Profiles (CCP)
    • Environment setup

    • Then, what is the state, action and reward?

12 of 17

Methods

  • State : local coordinate system of an agent (with its facing direction)
    • Relative goal position (𝛒, 𝜃)
    • Local velocity
    • Set of rays that measure distances to [Other agents, obstacles, interactions]
    • Profile parameters (concatenated reward weights)

13 of 17

Methods

  • State : local coordinate system of an agent (with its facing direction)
    • Relative goal position (𝛒, 𝜃)
    • Local velocity
    • Set of rays that measure distances to [Other agents, obstacles, interactions]
    • Profile parameters (concatenated reward weights)

  • Action : take one of 7 options below.
    • Stand still
    • Move forward & backward (max 1.3 / 0.13 m/s)
    • Rotate left & right
    • Move left & right (max 0.13m/s)

14 of 17

Methods

  • Reward function
    • Primitive subtasks
      • Goal seeking
      • Collision avoidance
      • Grouping
      • Interaction
    • Define both sparse and dense reward signals
      • Sparse : Large positive, or large penalty
      • : Positive / : Negative

15 of 17

Methods

  • TRAINING STRATEGY
    • Each agent is trained individually.
      • When an agent finishes an episode, it respawns and the simulation does not stop.
      • Finish condition(for agent) : collision with obstacles, maximum steps�
    • Curriculum-based approach
      • (1) Random initial position and goal position. Fixed environment.
      • (2) From a small number of agents, interactions, obstacles, to a large number of them
      • (3) More complex environment
      • (4) Random environments
      • (5) Random reward weights near their minimum and maximum values,�and gradually broaden the spectrum we want to cover.�-> Weights are kept constant for several training episodes (not every step or episode)
      • (6) Randomize interactions and obstacles.

16 of 17

Experiment & Evaluation

  • Please refer to the project page! [Link]
    • i5-9600K, RTX 2070, train 4 days
    • PPO implementation in the ml-agent framework by Unity
    • Character animation for visualization : Motion matching
  • Quantitative metrics w/ Density sensitivity test
    • Speed
    • Density
    • Distance to the closest neighbor(DCN)
    • Distance to the closest point of interest(DPOI)
  • If each reward weight is dominant,
    • Goal Seeking : Speed, DCN, DPOI ⇒ prefer to move rather than group or interaction
    • Collision Avoidance : Speed & keep larger distance to the other agents. (But tested with fixed w_g & w_gr)
    • Grouping : Speed and prefer to make a group
    • Interaction with POI : Speed & keep shorter distance to other agents, and interactions.

17 of 17

Discussion

(+) Multiple behaviors concurrently and mixed behaviors

(+) Heterogeneous profiles

(+) Run-time modification of profiles

( - ) More various scenarios can be considered.

( - ) More careful and efficient RL exploration

Future works

  • Advanced styles of reward formulation
  • Different character animations depending on agent styles.
  • More various features in the observations
  • Still far from real world data