1 of 61

Distributed Learning in �Realistic Virtual Environment

VDL Group

Weichao Qiu, Yuan Jing Vincent Yan, Kaiyue Wu

2 of 61

Outline

Background

Reinforcement learning
Realistic virtual environment

The two-stage distributed learning architecture

Distribute actors to multiple machines
Distribute learners to multiple machines
The complete system

The virtual-real arm challenge
Conclusion

4 of 61

The AI

Powerful machine learning algorithms make it possible to teach robots to achieve complex tasks, such as flying quadcopter, walking with two legs.

The training of robots require a lot of time and efforts. The training is usually done by trial and error, which is called reinforcement learning.

5 of 61

Training with real robots

Train robot arm to grasp with reinforcement learning [Levine et al. 2016]�http://www.theverge.com/2016/3/9/11186940/google-robotic-arms-neural-network-hand-eye-coordination (from google)

6 of 61

Training with real robots

Slow and very expensive !

7 of 61

Background

Instead of training with real robots, it is popular to do training using video game (in a virtual environment). Nature paper published by Deepmind [Mnih et al., 2015].

8 of 61

But virtual environments have very different quality

10 of 61

The challenges from using realistic virtual environments

CartPole:

Memory: 560KB

FPS: 170563

Humanoid-v1:

Memory: 81280KB

FPS: 1206

RealisticRendering: �Memory: 2152132K (~2GB)�FPS: 200 (without physics simulation)

11 of 61

Motivation from UnrealCV

12 of 61

Tools we used

Tensorflow�Machine learning library in Python
OpenAI gym + universe�Virtual environments for reinforcement learning
Unreal Engine + UnrealCV�Realistic virtual environments

13 of 61

The NeonRace virtual environment

5 Frames per second

Actions: �[Stop, Left, Right, Up, Top Left, Top Right, Back]

Make decision based on the image

14 of 61

Two-stage distributed learning architecture

15 of 61

Background of Technical Details

Learner: The program which is running the learning algorithm. It processes the data generated after applying actions to the environment.
Actor: The program which is running the interactions with environment. It generates the data for the learner program to process.

16 of 61

Original Architecture

17 of 61

Problem

One-learner and one-actor system is limited by the resource of a single machine.
Learner can be parallelized to multiple learners working at the same time
Actor can be separated out of the original learner-actor program and be parallelized to multiple actors working at the same time for each one of the learners

18 of 61

Problem

Motivation for separation: Learners usually work a lot faster than the actors, especially in the virtual realistic environment that we are targeting, which means a learner actually has the capability to work with multiple actors at the same time.
The original system is not making full use of the capability of the learners.

19 of 61

Our Architecture

20 of 61

Our Architecture

Learner-learner communication

21 of 61

P2P Advantages

Decentralized
Fault tolerant (progress when partitions and crashes)
Intrusion Tolerant (BFT protocols, low latency requirements)
Self-scalability
Cost effective

22 of 61

P2P Use Case 1

23 of 61

P2P Use Case 2

“Federated Learning”

24 of 61

Multi-Actor Training

25 of 61

Our Architecture

Learner-actor communication

26 of 61

Question:

Can any task be sped up by the separation?

27 of 61

Question:

No.
The separation adds the communication time to the whole process. If the communication time is too significant compared to the actual original computation time of the interactions, there will be no reasonable speedup.

28 of 61

Simulation

We don’t have enough real tasks to investigate this problem.
Hence we simulated the process, use sleep() to simulate the computation time at actors to see a relation among communication time, computation time and speed up.

31 of 61

Experiment Background

A3C (Asynchronous Advantage Actor-Critic) Algorithm: An open source learning algorithm for Deep Reinforcement Learning tasks, released by Google DeepMind group.
Neonrace game: A video game which gives us a car-driving task.

32 of 61

Multi-actor Result

Original: actor not separated

One actor on separate machine

33 of 61

Multi-actor Result

Two actors

Three actors

34 of 61

Multi-actor Result

35 of 61

Multi-Learner Training

36 of 61

P2P Multi-learner Implementation

37 of 61

MNIST Task

large database of handwritten digits
for training image processing and machine learning systems

labels for each image

Our task: train a model to look at images and predict what digits they are

38 of 61

MNIST Results

39 of 61

Computer Vision Tasks on Tensorflow

41 of 61

Integration into NeonRace

42 of 61

Issue 1: Message Loss

We have a message loss rate of 1.5-2%. Message loss is a serious issue here, since we are fragmenting each matrix data into ~1600 packets.

Fix: python multicast library (unreliable) ⇒ Spread python wrapper (reliable)

15% slower, but it’s okay

43 of 61

Issue 2: Learner Divergence

We can only synchronize as frequent as one data exchange per 10 steps. Otherwise, the communication overhead will be too much. Learners diverge as learning goes on, and even with weight exchanges they cannot converge.

# English-Chinese Analogy

Fix: Use a Parameter Server (storage object) provided by Tensorflow to synchronize a central model periodically. This is a temporal fix, and ideally we should come up with better synchronization methods.

44 of 61

Cominatination of the multi-learner and multi-actor system

45 of 61

The adaptation between real-virtual domains

46 of 61

Domain adaptation

Whether the model trained in the virtual world be able to adapt to the real world?

Yes and No.

48 of 61

$30

$300,000 ^{[1] [2]}

[1] http://www.willowgarage.com/pages/pr2/order

[2] Tzeng, Eric, et al. "Adapting deep visuomotor representations with weak pairwise constraints." Workshop on the Algorithmic Foundations of Robotics (WAFR). 2016.

49 of 61

A virtual + real robot arm platform�For domain-adaptation research

(a side product)

53 of 61

Two components in the platform

A low-cost real robot arm

No sensor, five motors, inaccurate motion, inexpensive

A realistic virtual arm

Can be placed into many realistic virtual environments and interact with objects
Similar appearance to the real arm
The appearance can be controlled

Download link: https://cs.jhu.edu/~qiuwch/RoboArm.zip�Purchase link: https://www.amazon.com/OWI-OWI-535-Robotic-Arm-Edge/dp/B0017OFRCY

54 of 61

A quick demo

56 of 61

A domain adaptation research

10 Degree accuracy: Prediction is considered correct, if the error is within 10 degrees.

57 of 61

Other progress with Realistic Virtual Environments

58 of 61

Deployment of Realistic Virtual Environments

Nvidia-docker

Make realistic virtual environments can be easily deployed to Linux servers

Headless linux server

Make it possible to use a Linux server farms to run many rendering tasks at the same time

60 of 61

Conclusion

Distributed System + Computer Vision (AI) + Virtual Reality
Multi-actor provides linear speedup
Multi-learner provides linear speedup on non-complex tasks
Learner divergence issue

61 of 61

Future work

Domain adaptation research based on the robot arm
Better synchronization methods for multi-learner

1 of 61

2 of 61

3 of 61

4 of 61

5 of 61

6 of 61

7 of 61

8 of 61

9 of 61

10 of 61

11 of 61

12 of 61

13 of 61

14 of 61

15 of 61

16 of 61

17 of 61

18 of 61

19 of 61

20 of 61

21 of 61

22 of 61

23 of 61

24 of 61

25 of 61

26 of 61

27 of 61

28 of 61

29 of 61

30 of 61

31 of 61

32 of 61

33 of 61

34 of 61

35 of 61

36 of 61

37 of 61

38 of 61

39 of 61

40 of 61

41 of 61

42 of 61

43 of 61

44 of 61

45 of 61

46 of 61

47 of 61

48 of 61

49 of 61

50 of 61

51 of 61

52 of 61

53 of 61

54 of 61

55 of 61

56 of 61

57 of 61

58 of 61

59 of 61

60 of 61

61 of 61