1 of 29

Special Topics Track

Reinforcement Learning in Games

Brought to you by: Ming and Henry

2 of 29

Today’s Plan

Explain the general idea behind reinforcement learning (RL)
Show a cool demo of what’s possible with RL within Unity
Break down the different components that make RL possible

3 of 29

What is Reinforcement Learning?

4 of 29

(ChatGPT is not made with Reinforcement Learning)

5 of 29

What is Reinforcement Learning

Very different from supervised learning methods! (Computer Vision, Sentiment Analysis in NLP, etc.)
Based on a decision tree known as the Markov Decision Process

6 of 29

What is Reinforcement Learning

Simple game setup:

You are the :)
Circle = 1 pt
Pentagon = 5 pts
Triangle = -1 pt
X is a barrier

+5

+1

-1

7 of 29

What is Reinforcement Learning

The agent repeats the following actions:

Observation
Prediction
Action
Reward calculation

After the game ends, the agent evaluates its decisions

+5

+1

-1

8 of 29

What is Reinforcement Learning

Observation: The agent notes the position of the three shapes.

+5

+1

-1

9 of 29

What is Reinforcement Learning

Prediction: The agent notes its own possible actions:

Moving right earns it ?? pt
Moving down earns it ?? pt

+5

+1

-1

10 of 29

What is Reinforcement Learning

Action: The agent moves down.

+5

-1

11 of 29

What is Reinforcement Learning

Reward: The agent earns 1 point!

+5

-1

12 of 29

What is Reinforcement Learning

If the game ends here, the agent now remembers that moving towards the circle earned it a point!

+5

-1

13 of 29

What is Reinforcement Learning

To begin, the agent would not know how to choose
It experiments randomly, then readjusts its approach based on the rewards earned!

+5

+1

-1

14 of 29

What is Reinforcement Learning

More complex game setup:

You are the :)
Circle = 1 pt
Pentagon = 5 pts
Triangle = -1 pt
X is a barrier
Agent assumes the map can change at any time

+5

+1

-1

15 of 29

What is Reinforcement Learning

Prediction: Assume the agent knows the point values:

Moving right loses it 1 pt

But in another 2 steps it can potentially earn 5 points!

Moving down earns it 1 pt

How should the agent choose?

+5

+1

-1

16 of 29

Hyperparameters

Config settings for the agent that we set prior to training
Impacts the agent’s decision making inclinations, learning habits, brain size, training time, etc.

Choosing the right values can make a big difference!

There are a lot, so we will briefly explain two: Gamma and Beta

For more detailed information, check out the docs from the MLAgents toolkit: https://github.com/gzrjzcx/ML-agents/blob/master/docs/Training-PPO.md

17 of 29

Hyperparameters - Gamma

Since we assume the map can change at any time, future rewards aren’t guaranteed
Gamma is the discount factor, dictating how much the agent cares about future rewards

Usually between 0.8 and 0.955 according to Unity

+5

+1

-1

18 of 29

Hyperparameters - Gamma

Moving right, the agent immediately loses 1 point, and in 2 steps can gain 5 points
With gamma = 0.8, the reward becomes:

-1 + 5 × 0.8²

Thus, higher gamma → agent favors exploration!

+5

+1

-1

19 of 29

Hyperparameters - Beta

Say the agent has an extremely low Gamma value, like 0.5 (unrealistic, but used to prove a point)

This means the right path with the +5 tile will never be seen as an optimal move

How can we ensure the agent tries to explore the environment?

+5

+1

-1

20 of 29

Hyperparameters - Beta

The hyperparameter Beta increases randomness in the agent’s decision
Beta dictates a small percent chance the agent will choose a random action instead of the action with the most perceived reward
Usually between 1e-4 & 1e-2, according to Unity

+5

+1

-1

21 of 29

How does ML Agents come into play?

MLAgents is Unity’s plugin for Reinforcement Learning!
Built-in neural networks for streamlined RL training

22 of 29

How does ML Agents come into play?

To train an Agent, ML-Agents requires:

A virtual environment To host the neural network
A unity editor with the ML-Agent package installed
An open unity scene with an Agent present in the scene

23 of 29

How can we make it better?

Adjusting hyperparameters

Changing hyperparameters can potentially improve efficiency drastically
However, fine-tuning may not yield the expected results!
RL is very unstable and extremely sensitive to hyperparameters – sometimes even changing the random seed results in vastly different performance!

24 of 29

How can we make it better?

Reward →

2 sets of runs, exact same settings, different random seeds!

Source: https://arxiv.org/pdf/1709.06560.pdf

25 of 29

How can we make it better?

Modifying the environment

How will the agent be able to navigate different map layouts? Randomize obstacle placements!
What if the agent needs to track the goal from any starting point on the map? Randomize the agent’s spawn point!

Note: Generally, more random factors can slow down the training process by a lot and may introduce additional “errors” in training.

26 of 29

How can we make it better?

Tweaking the agent’s rewards

RL agents perform well when there are dense rewards!
They train faster generally when each step taken grants a reward/punishment as opposed to gaining just one large reward/punishment for completing/failing the end goal
In our example, the agent gains a scaling reward for being closer to the goal after each step

27 of 29

How can we make it better?

Providing more observation data

The agent can only perceive through strictly what you feed it!
Add raycasts that detect objects at different angles
Pass in any other environment-related info that the agent should know

Note: Only pass in observations that the agent will reasonably know when running outside of training!

28 of 29

More Resources

https://github.com/Unity-Technologies/ml-agents

29 of 29

Thanks for coming!