1 of 29

Special Topics Track

Reinforcement Learning in Games

Brought to you by: Ming and Henry

2 of 29

Today’s Plan

  • Explain the general idea behind reinforcement learning (RL)
  • Show a cool demo of what’s possible with RL within Unity
  • Break down the different components that make RL possible

3 of 29

What is Reinforcement Learning?

4 of 29

(ChatGPT is not made with Reinforcement Learning)

5 of 29

What is Reinforcement Learning

  • Very different from supervised learning methods! (Computer Vision, Sentiment Analysis in NLP, etc.)
  • Based on a decision tree known as the Markov Decision Process

6 of 29

What is Reinforcement Learning

Simple game setup:

  • You are the :)
  • Circle = 1 pt
  • Pentagon = 5 pts
  • Triangle = -1 pt
  • X is a barrier

+5

+1

-1

7 of 29

What is Reinforcement Learning

The agent repeats the following actions:

  • Observation
  • Prediction
  • Action
  • Reward calculation

After the game ends, the agent evaluates its decisions

+5

+1

-1

8 of 29

What is Reinforcement Learning

Observation: The agent notes the position of the three shapes.

+5

+1

-1

9 of 29

What is Reinforcement Learning

Prediction: The agent notes its own possible actions:

  • Moving right earns it ?? pt
  • Moving down earns it ?? pt

+5

+1

-1

10 of 29

What is Reinforcement Learning

Action: The agent moves down.

+5

-1

11 of 29

What is Reinforcement Learning

Reward: The agent earns 1 point!

+5

-1

12 of 29

What is Reinforcement Learning

If the game ends here, the agent now remembers that moving towards the circle earned it a point!

+5

-1

13 of 29

What is Reinforcement Learning

  • To begin, the agent would not know how to choose
  • It experiments randomly, then readjusts its approach based on the rewards earned!

+5

+1

-1

14 of 29

What is Reinforcement Learning

More complex game setup:

  • You are the :)
  • Circle = 1 pt
  • Pentagon = 5 pts
  • Triangle = -1 pt
  • X is a barrier
  • Agent assumes the map can change at any time

+5

+1

-1

15 of 29

What is Reinforcement Learning

Prediction: Assume the agent knows the point values:

  • Moving right loses it 1 pt
    • But in another 2 steps it can potentially earn 5 points!
  • Moving down earns it 1 pt

How should the agent choose?

+5

+1

-1

16 of 29

Hyperparameters

  • Config settings for the agent that we set prior to training
  • Impacts the agent’s decision making inclinations, learning habits, brain size, training time, etc.
    • Choosing the right values can make a big difference!
  • There are a lot, so we will briefly explain two: Gamma and Beta

17 of 29

Hyperparameters - Gamma

  • Since we assume the map can change at any time, future rewards aren’t guaranteed
  • Gamma is the discount factor, dictating how much the agent cares about future rewards
    • Usually between 0.8 and 0.955 according to Unity

+5

+1

-1

18 of 29

Hyperparameters - Gamma

  • Moving right, the agent immediately loses 1 point, and in 2 steps can gain 5 points
  • With gamma = 0.8, the reward becomes:
    • -1 + 5 × 0.82
  • Thus, higher gamma → agent favors exploration!

+5

+1

-1

19 of 29

Hyperparameters - Beta

  • Say the agent has an extremely low Gamma value, like 0.5 (unrealistic, but used to prove a point)
    • This means the right path with the +5 tile will never be seen as an optimal move
  • How can we ensure the agent tries to explore the environment?

+5

+1

-1

20 of 29

Hyperparameters - Beta

  • The hyperparameter Beta increases randomness in the agent’s decision
  • Beta dictates a small percent chance the agent will choose a random action instead of the action with the most perceived reward
  • Usually between 1e-4 & 1e-2, according to Unity

+5

+1

-1

21 of 29

How does ML Agents come into play?

  • MLAgents is Unity’s plugin for Reinforcement Learning!
  • Built-in neural networks for streamlined RL training

22 of 29

How does ML Agents come into play?

  • To train an Agent, ML-Agents requires:
    • A virtual environment To host the neural network
    • A unity editor with the ML-Agent package installed
    • An open unity scene with an Agent present in the scene

23 of 29

How can we make it better?

  • Adjusting hyperparameters
    • Changing hyperparameters can potentially improve efficiency drastically
    • However, fine-tuning may not yield the expected results!
    • RL is very unstable and extremely sensitive to hyperparameters – sometimes even changing the random seed results in vastly different performance!

24 of 29

How can we make it better?

Reward →

2 sets of runs, exact same settings, different random seeds!

Source: https://arxiv.org/pdf/1709.06560.pdf

25 of 29

How can we make it better?

  • Modifying the environment
    • How will the agent be able to navigate different map layouts? Randomize obstacle placements!
    • What if the agent needs to track the goal from any starting point on the map? Randomize the agent’s spawn point!
  • Note: Generally, more random factors can slow down the training process by a lot and may introduce additional “errors” in training.

26 of 29

How can we make it better?

  • Tweaking the agent’s rewards
    • RL agents perform well when there are dense rewards!
    • They train faster generally when each step taken grants a reward/punishment as opposed to gaining just one large reward/punishment for completing/failing the end goal
    • In our example, the agent gains a scaling reward for being closer to the goal after each step

27 of 29

How can we make it better?

  • Providing more observation data
    • The agent can only perceive through strictly what you feed it!
    • Add raycasts that detect objects at different angles
    • Pass in any other environment-related info that the agent should know
  • Note: Only pass in observations that the agent will reasonably know when running outside of training!

28 of 29

More Resources

29 of 29

Thanks for coming!