1 of 17

Paper review

Configurable Crowd Profiles(SIGGRAPH 2022)

Yongwoo Lee

2 of 17

Introduction

Configurable Crowd Profiles(CCP) ~ 0:50

Each agent has multiple behaviors concurrently

Goal seeking, Collision avoidance, Grouping, Interaction

Each agent has an individual profile(Variable). -> Heterogenous crowds

3 of 17

Background

Major categories of crowd simulation models

Microscopic

4 of 17

Background

Major categories of crowd simulation models

Microscopic

Macroscopic

5 of 17

Background

Major categories of crowd simulation models

Microscopic

Macroscopic

Mesoscopic

6 of 17

Background

Major categories of crowd simulation models

Microscopic

Macroscopic

Mesoscopic

7 of 17

Introduction

Problem definition

Reach a goal while avoiding collisions

Collision with other agents
Collision with environments & attractions

Heterogeneity

Each agent has its own personality.�E.g. Initiative / Careful�

Variation of behaviors

Group properties
Interactions
Secondary actions

8 of 17

Introduction

The rise of reinforcement learning

Observation : position, speed, obstacles info
Action : moving direction
Reward : Reaching a goal (+) / Collide with obstacles (-)

9 of 17

Introduction

Challenge of Deep RL methods

Trial & error for desirable result : time consuming

Find the best tune for reward terms
Hard to mix multiple behaviors in a single policy.

Sensitive adjustment and balancing of reward functions

Non-flexible : Less controllability

Crowds with an uniform behavior
If we want to change agent’s strategy or characteristic?�-> training again.

10 of 17

Introduction

RL Policy Parameterization : more controllability!

Policy parameterization techniques

more exploration on state, action, reward space
Learn more generalizable and complex tasks

Objective : With this technique,

We capture multiple behaviors concurrently, �with flexible importance weights of each behavior.
Instead of manually defining reward combinations, �we vary the weights of different reward signals during training.

11 of 17

Methods

Framework : Configurable Crowd Profiles (CCP)

Environment setup

Then, what is the state, action and reward?

12 of 17

Methods

State : local coordinate system of an agent (with its facing direction)

Relative goal position (𝛒, 𝜃)
Local velocity
Set of rays that measure distances to [Other agents, obstacles, interactions]
Profile parameters (concatenated reward weights)

13 of 17

Methods

State : local coordinate system of an agent (with its facing direction)

Relative goal position (𝛒, 𝜃)
Local velocity
Set of rays that measure distances to [Other agents, obstacles, interactions]
Profile parameters (concatenated reward weights)

�

Action : take one of 7 options below.

Stand still
Move forward & backward (max 1.3 / 0.13 m/s)
Rotate left & right
Move left & right (max 0.13m/s)

14 of 17

Methods

Reward function

Primitive subtasks

Goal seeking
Collision avoidance
Grouping
Interaction

Define both sparse and dense reward signals

Sparse : Large positive, or large penalty
: Positive / : Negative

15 of 17

Methods

TRAINING STRATEGY

Each agent is trained individually.

When an agent finishes an episode, it respawns and the simulation does not stop.
Finish condition(for agent) : collision with obstacles, maximum steps�

Curriculum-based approach

(1) Random initial position and goal position. Fixed environment.
(2) From a small number of agents, interactions, obstacles, to a large number of them
(3) More complex environment
(4) Random environments
(5) Random reward weights near their minimum and maximum values,�and gradually broaden the spectrum we want to cover.�-> Weights are kept constant for several training episodes (not every step or episode)
(6) Randomize interactions and obstacles.

16 of 17

Experiment & Evaluation

Please refer to the project page! [Link]

i5-9600K, RTX 2070, train 4 days
PPO implementation in the ml-agent framework by Unity
Character animation for visualization : Motion matching

Quantitative metrics w/ Density sensitivity test

Speed
Density
Distance to the closest neighbor(DCN)
Distance to the closest point of interest(DPOI)

If each reward weight is dominant,

Goal Seeking : Speed, DCN, DPOI ↑ ⇒ prefer to move rather than group or interaction
Collision Avoidance : Speed ↑ & keep larger distance to the other agents. (But tested with fixed w_g & w_gr)
Grouping : Speed ↓ and prefer to make a group
Interaction with POI : Speed ↓ & keep shorter distance to other agents, and interactions.

17 of 17

Discussion

(+) Multiple behaviors concurrently and mixed behaviors

(+) Heterogeneous profiles

(+) Run-time modification of profiles

( - ) More various scenarios can be considered.

( - ) More careful and efficient RL exploration

Future works

Advanced styles of reward formulation
Different character animations depending on agent styles.
More various features in the observations
Still far from real world data