1 of 2

Made by: Ziyi Zhang

v01.04.2024

Advanced

Task Helpers

REINFORCEMENT LEARNING

Agent learns how to finish a task by interacting with the environment and maximizing the reward it receives.

Think Like a Learner:

Can you find reinforcement learning examples in daily life?

Think Like a Coder:�How can you implement Q-learning on your SPIKE Prime?

HOW IT WORKS

Flip over for more details!

Agent: The subject of learning and decision making

Environment: The object or surroundings that agent interacts with

Action: What agent decides to do in the next time step

State: The current situation of the agent

Reward: Feedback from the environment to the agent

Markov Decision Process:

VOCABULARY

2 of 2

CODE IT!

> Initialize α,γ,ε and Q table

> While not finish training k

episodes:

> Move the robot to start position

> While this episode not finish:

> If ε is greater than random

value:

> Explore

> If ε less than or equal to

random value:

> Exploit

> Move the robot to next state

> Update Q value of current

state in Q table with Bellman

equation

> Episode +1

> Agent finish training

HOW IT WORKS: CONTINUED

Q table

HOW YOU USE IT:

PSEUDO CODE:

Ε-greedy algorithm: Agent needs both to explore and exploit

EXPLORE:

EXPLOIT:

Randomly pick up actions to explore the environment

Pick up the action that has highest Q value to utilize what the agent has learned

STEPS:

Set a ε value
Randomly generate a number k between (0,1)
If ε > k, explore. Else, exploit.

Bellman Equation: Updating Q value

α , Learning rate:

Determines the extent to which we refer to the previous Q value

γ , Discount factor:

Determines how much importance we want to give to future rewards

Q-Learning:

For each action, calculate a Q value corresponding to it based on the reward, which reflects the long-term reward of each action.

The initial value of all Q values in Q table are zero.