1 of 61

Distributed Learning in �Realistic Virtual Environment

VDL Group

Weichao Qiu, Yuan Jing Vincent Yan, Kaiyue Wu

2 of 61

Outline

  • Background
    • Reinforcement learning
    • Realistic virtual environment
  • The two-stage distributed learning architecture
    • Distribute actors to multiple machines
    • Distribute learners to multiple machines
    • The complete system
  • The virtual-real arm challenge
  • Conclusion

3 of 61

Background

4 of 61

The AI

Powerful machine learning algorithms make it possible to teach robots to achieve complex tasks, such as flying quadcopter, walking with two legs.

The training of robots require a lot of time and efforts. The training is usually done by trial and error, which is called reinforcement learning.

5 of 61

Training with real robots

Train robot arm to grasp with reinforcement learning [Levine et al. 2016]�http://www.theverge.com/2016/3/9/11186940/google-robotic-arms-neural-network-hand-eye-coordination (from google)

6 of 61

Training with real robots

Slow and very expensive !

7 of 61

Background

Instead of training with real robots, it is popular to do training using video game (in a virtual environment). Nature paper published by Deepmind [Mnih et al., 2015].

8 of 61

But virtual environments have very different quality

9 of 61

10 of 61

The challenges from using realistic virtual environments

CartPole:

Memory: 560KB

FPS: 170563

Humanoid-v1:

Memory: 81280KB

FPS: 1206

RealisticRendering: �Memory: 2152132K (~2GB)�FPS: 200 (without physics simulation)

11 of 61

Motivation from UnrealCV

12 of 61

Tools we used

  • Tensorflow�Machine learning library in Python
  • OpenAI gym + universe�Virtual environments for reinforcement learning
  • Unreal Engine + UnrealCV�Realistic virtual environments

13 of 61

The NeonRace virtual environment

5 Frames per second

Actions: �[Stop, Left, Right, Up, Top Left, Top Right, Back]

Make decision based on the image

14 of 61

Two-stage distributed learning architecture

15 of 61

Background of Technical Details

  • Learner: The program which is running the learning algorithm. It processes the data generated after applying actions to the environment.
  • Actor: The program which is running the interactions with environment. It generates the data for the learner program to process.

16 of 61

Original Architecture

17 of 61

Problem

  • One-learner and one-actor system is limited by the resource of a single machine.
  • Learner can be parallelized to multiple learners working at the same time
  • Actor can be separated out of the original learner-actor program and be parallelized to multiple actors working at the same time for each one of the learners

18 of 61

Problem

  • Motivation for separation: Learners usually work a lot faster than the actors, especially in the virtual realistic environment that we are targeting, which means a learner actually has the capability to work with multiple actors at the same time.
  • The original system is not making full use of the capability of the learners.

19 of 61

Our Architecture

20 of 61

Our Architecture

  • Learner-learner communication

21 of 61

P2P Advantages

  • Decentralized
  • Fault tolerant (progress when partitions and crashes)
  • Intrusion Tolerant (BFT protocols, low latency requirements)
  • Self-scalability
  • Cost effective

22 of 61

P2P Use Case 1

23 of 61

P2P Use Case 2

“Federated Learning”

24 of 61

Multi-Actor Training

25 of 61

Our Architecture

  • Learner-actor communication

26 of 61

Question:

  • Can any task be sped up by the separation?

27 of 61

Question:

  • No.
  • The separation adds the communication time to the whole process. If the communication time is too significant compared to the actual original computation time of the interactions, there will be no reasonable speedup.

28 of 61

Simulation

  • We don’t have enough real tasks to investigate this problem.
  • Hence we simulated the process, use sleep() to simulate the computation time at actors to see a relation among communication time, computation time and speed up.

29 of 61

Simulation

30 of 61

Simulation

31 of 61

Experiment Background

  • A3C (Asynchronous Advantage Actor-Critic) Algorithm: An open source learning algorithm for Deep Reinforcement Learning tasks, released by Google DeepMind group.
  • Neonrace game: A video game which gives us a car-driving task.

32 of 61

Multi-actor Result

  • Original: actor not separated
  • One actor on separate machine

33 of 61

Multi-actor Result

  • Two actors

  • Three actors

34 of 61

Multi-actor Result

35 of 61

Multi-Learner Training

36 of 61

P2P Multi-learner Implementation

37 of 61

MNIST Task

  • large database of handwritten digits
  • for training image processing and machine learning systems

  • labels for each image

Our task: train a model to look at images and predict what digits they are

38 of 61

MNIST Results

39 of 61

Computer Vision Tasks on Tensorflow

40 of 61

Why?

41 of 61

Integration into NeonRace

42 of 61

Issue 1: Message Loss

We have a message loss rate of 1.5-2%. Message loss is a serious issue here, since we are fragmenting each matrix data into ~1600 packets.

Fix: python multicast library (unreliable) ⇒ Spread python wrapper (reliable)

15% slower, but it’s okay

43 of 61

Issue 2: Learner Divergence

We can only synchronize as frequent as one data exchange per 10 steps. Otherwise, the communication overhead will be too much. Learners diverge as learning goes on, and even with weight exchanges they cannot converge.

# English-Chinese Analogy

Fix: Use a Parameter Server (storage object) provided by Tensorflow to synchronize a central model periodically. This is a temporal fix, and ideally we should come up with better synchronization methods.

44 of 61

Cominatination of the multi-learner and multi-actor system

45 of 61

The adaptation between real-virtual domains

46 of 61

Domain adaptation

Whether the model trained in the virtual world be able to adapt to the real world?

Yes and No.

47 of 61

48 of 61

$30

$300,000 [1] [2]

[1] http://www.willowgarage.com/pages/pr2/order

[2] Tzeng, Eric, et al. "Adapting deep visuomotor representations with weak pairwise constraints." Workshop on the Algorithmic Foundations of Robotics (WAFR). 2016.

49 of 61

A virtual + real robot arm platform�For domain-adaptation research

(a side product)

50 of 61

51 of 61

52 of 61

53 of 61

Two components in the platform

  • A low-cost real robot arm
    • No sensor, five motors, inaccurate motion, inexpensive
  • A realistic virtual arm
    • Can be placed into many realistic virtual environments and interact with objects
    • Similar appearance to the real arm
    • The appearance can be controlled

Download link: https://cs.jhu.edu/~qiuwch/RoboArm.zip�Purchase link: https://www.amazon.com/OWI-OWI-535-Robotic-Arm-Edge/dp/B0017OFRCY

54 of 61

A quick demo

55 of 61

56 of 61

A domain adaptation research

10 Degree accuracy: Prediction is considered correct, if the error is within 10 degrees.

57 of 61

Other progress with Realistic Virtual Environments

58 of 61

Deployment of Realistic Virtual Environments

  • Nvidia-docker
    • Make realistic virtual environments can be easily deployed to Linux servers
  • Headless linux server
    • Make it possible to use a Linux server farms to run many rendering tasks at the same time

59 of 61

Conclusion

60 of 61

Conclusion

  • Distributed System + Computer Vision (AI) + Virtual Reality
  • Multi-actor provides linear speedup
  • Multi-learner provides linear speedup on non-complex tasks
  • Learner divergence issue

61 of 61

Future work

  • Domain adaptation research based on the robot arm
  • Better synchronization methods for multi-learner