Made by: Ziyi Zhang
v01.04.2024
Advanced
AI
Task Helpers
REINFORCEMENT LEARNING
Agent learns how to finish a task by interacting with the environment and maximizing the reward it receives.
Think Like a Learner:
Can you find reinforcement learning examples in daily life?
Think Like a Coder:�How can you implement Q-learning on your SPIKE Prime?
HOW IT WORKS
Flip over for more details!
Agent: The subject of learning and decision making
Environment: The object or surroundings that agent interacts with
Action: What agent decides to do in the next time step
State: The current situation of the agent
Reward: Feedback from the environment to the agent
Markov Decision Process:
VOCABULARY
CODE IT!
> Initialize α,γ,ε and Q table
> While not finish training k
episodes:
> Move the robot to start position
> While this episode not finish:
> If ε is greater than random
value:
> Explore
> If ε less than or equal to
random value:
> Exploit
> Move the robot to next state
> Update Q value of current
state in Q table with Bellman
equation
> Episode +1
> Agent finish training
HOW IT WORKS: CONTINUED
Q table
HOW YOU USE IT:
PSEUDO CODE:
Ε-greedy algorithm: Agent needs both to explore and exploit
EXPLORE:
EXPLOIT:
Randomly pick up actions to explore the environment
Pick up the action that has highest Q value to utilize what the agent has learned
STEPS:
Bellman Equation: Updating Q value
α , Learning rate:
Determines the extent to which we refer to the previous Q value
γ , Discount factor:
Determines how much importance we want to give to future rewards
Q-Learning:
For each action, calculate a Q value corresponding to it based on the reward, which reflects the long-term reward of each action.
The initial value of all Q values in Q table are zero.