Distributed Learning in �Realistic Virtual Environment
VDL Group
Weichao Qiu, Yuan Jing Vincent Yan, Kaiyue Wu
Outline
Background
The AI
Powerful machine learning algorithms make it possible to teach robots to achieve complex tasks, such as flying quadcopter, walking with two legs.
The training of robots require a lot of time and efforts. The training is usually done by trial and error, which is called reinforcement learning.
Training with real robots
Train robot arm to grasp with reinforcement learning [Levine et al. 2016]�http://www.theverge.com/2016/3/9/11186940/google-robotic-arms-neural-network-hand-eye-coordination (from google)
Training with real robots
Slow and very expensive !
Background
Instead of training with real robots, it is popular to do training using video game (in a virtual environment). Nature paper published by Deepmind [Mnih et al., 2015].
But virtual environments have very different quality
The challenges from using realistic virtual environments
CartPole:
Memory: 560KB
FPS: 170563
Humanoid-v1:
Memory: 81280KB
FPS: 1206
RealisticRendering: �Memory: 2152132K (~2GB)�FPS: 200 (without physics simulation)
Motivation from UnrealCV
Tools we used
The NeonRace virtual environment
5 Frames per second
Actions: �[Stop, Left, Right, Up, Top Left, Top Right, Back]
Make decision based on the image
Two-stage distributed learning architecture
Background of Technical Details
Original Architecture
Problem
Problem
Our Architecture
Our Architecture
P2P Advantages
P2P Use Case 1
P2P Use Case 2
“Federated Learning”
Multi-Actor Training
Our Architecture
Question:
Question:
Simulation
Simulation
Simulation
Experiment Background
Multi-actor Result
Multi-actor Result
Multi-actor Result
Multi-Learner Training
P2P Multi-learner Implementation
MNIST Task
Our task: train a model to look at images and predict what digits they are
MNIST Results
Computer Vision Tasks on Tensorflow
Why?
Integration into NeonRace
Issue 1: Message Loss
We have a message loss rate of 1.5-2%. Message loss is a serious issue here, since we are fragmenting each matrix data into ~1600 packets.
Fix: python multicast library (unreliable) ⇒ Spread python wrapper (reliable)
15% slower, but it’s okay
Issue 2: Learner Divergence
We can only synchronize as frequent as one data exchange per 10 steps. Otherwise, the communication overhead will be too much. Learners diverge as learning goes on, and even with weight exchanges they cannot converge.
# English-Chinese Analogy
Fix: Use a Parameter Server (storage object) provided by Tensorflow to synchronize a central model periodically. This is a temporal fix, and ideally we should come up with better synchronization methods.
Cominatination of the multi-learner and multi-actor system
The adaptation between real-virtual domains
Domain adaptation
Whether the model trained in the virtual world be able to adapt to the real world?
Yes and No.
$30
$300,000 [1] [2]
[1] http://www.willowgarage.com/pages/pr2/order
[2] Tzeng, Eric, et al. "Adapting deep visuomotor representations with weak pairwise constraints." Workshop on the Algorithmic Foundations of Robotics (WAFR). 2016.
A virtual + real robot arm platform�For domain-adaptation research
(a side product)
Two components in the platform
Download link: https://cs.jhu.edu/~qiuwch/RoboArm.zip�Purchase link: https://www.amazon.com/OWI-OWI-535-Robotic-Arm-Edge/dp/B0017OFRCY
A quick demo
A domain adaptation research
10 Degree accuracy: Prediction is considered correct, if the error is within 10 degrees.
Other progress with Realistic Virtual Environments
Deployment of Realistic Virtual Environments
Conclusion
Conclusion
Future work