Lunar Lander problem using a Deep Q-learning Neural Network�
Srinivas Rahul Sapireddy
2021 Fall Hack-A-Roo
Problem Statement
Solving Lunar Lander problem provided by OpenAI Gym using Deep Q Learning Neural Network.
DQN Algorithm�
Action Space
Data
Environment Source: https://gym.openai.com/envs/LunarLander-v2/
Code and Tools's Used
Requires Python 3 or above.
Code tested on Spyder IDE from Anaconda
Run train.py to see the code in action.
Source Code: GitHub repository
Model
Hyperparameter | Value |
episodes | 1000 |
buffer_size | 100000 |
batch_size | 64 |
gamma | 0.99 |
learning rate | 1e-3 |
tau | 1e-3 |
steps | 4 |
Q - Learning
Observation Space
1. State – current state of the environment (8 – dim state space)
2. Action – Agent acts based on current state.
3. Reward – If lander crashes or comes to rest, the episode is considered complete and receive reward.
Neural Network Model
Do Nothing
Fire Bottom
Fire Left
Fire Right
Actions
W2
W1
W3
8 observations
8 observations
8 observations
8 observations
Results
Rewards obtained at each training episode with early stopping
Blue line – reward values per experiment
Orange line – rolling mean for last 1000 episodes.
Reward is positive after 300 episodes.
Knowledge Gained
Training the agent take more time.
Improved the neural network model accordingly to solve the agent in less time.
Used different parameter values to see how the training change over time.
The lunar lander can solve the problem using a memory length of 100000 in less time.
Extension – Double DQN
Here I extended the existing neural network using two different function approximators that are trained on different samples, and one is used for selecting the best action and other for evaluating the value of this action. since these two approximators have seen different samples, it is less that they overestimate the same action. Hence the name Double Q-Learning
Conclusion
Reference: https://gym.openai.com/docs/
Future Work
Team Name: STARLANDER: The Millennium Falcon