1 of 22

Building RL applications with RLlib

Eric Liang

1

http://rllib.io

RLlib tutorial

2 of 22

This tutorial

  • Overview of reinforcement learning and RLlib
  • Focus on applied RL
    • Understanding how to model problems as RL environments
    • How to use RLlib for distributed training in simulation
  • Learn more
    • Documentation at http://rllib.io
      • First page has 60-second overview
      • Links to API, algorithms, examples, etc.

2

http://rllib.io

RLlib tutorial

3 of 22

RL vs Supervised Learning

3

Input: cat picture

Output:

it's a cat

http://rllib.io

RLlib tutorial

4 of 22

RL vs Supervised Learning

4

Users

Service

Search for item

Results

Add item to cart

Suggestions

Checkout

($$$)

http://rllib.io

RLlib tutorial

5 of 22

RL vs Supervised Learning

5

Users

Service

Search for item

Results

Add item to cart

Suggestions

Checkout

($$$)

Environment

Agent

Observations

Observations

Actions

Actions

Reward

($$$)

http://rllib.io

RLlib tutorial

6 of 22

Reinforcement Learning is centered around interaction

6

observation + reward

agent

environment

policy

actions

http://rllib.io

RLlib tutorial

7 of 22

Applications of RL

AlphaGo (2016)

    • Observations:
      • board state
    • Actions:
      • where to place stone
    • Rewards:
      • win / lose

7

http://rllib.io

RLlib tutorial

8 of 22

Applications of RL

Database query optimization: DQ (2018), Neo (2019)

    • Observations:
      • relations joined so far
      • remaining relations
    • Actions:
      • which relations to join
    • Rewards:
      • cost of query

8

http://rllib.io

RLlib tutorial

9 of 22

Applications of RL

Optimizing Data Structures: NeuroCuts (2019)

    • Observations:
      • state of current tree node
    • Actions:
      • cut or partition current node
    • Rewards:
      • depth + size of tree

9

Train a NeuroCuts

Agent with RL

optimized tree data structure

packet classification rules

deploy artifact

http://rllib.io

RLlib tutorial

10 of 22

RL vs Supervised Learning

  • Something in common: requires lots of training data
  • Supervised learning
    • need labeled data
  • Reinforcement learning
    • can generate data through simulations
    • or, large-scale interactions with the environment
  • Both require scalable software libraries for training

10

http://rllib.io

RLlib tutorial

11 of 22

What is RLlib?

11

Framework for scalable applied reinforcement learning

http://rllib.io

RLlib tutorial

12 of 22

What is RLlib?

12

http://rllib.io

RLlib tutorial

13 of 22

Unified framework for scalable RL

13

Evolution

Strategies

(vs Redis-based)

Distributed PPO

(vs OpenMPI)

Ape-X Distributed

DQN, DDPG

http://rllib.io

RLlib tutorial

14 of 22

Broad range of scalable algorithms

  • High-throughput architectures
    • Distributed Prioritized Experience Replay (Ape-X)
    • Importance Weighted Actor-Learner Architecture (IMPALA)
  • Gradient-based
    • Advantage Actor-Critic (A2C, A3C)
    • Deep Deterministic Policy Gradients (DDPG)
    • Deep Q Networks (DQN, Rainbow)
    • Proximal Policy Optimization (PPO)
    • Soft Actor Critic (SAC)
  • Derivative-free
    • Augmented Random Search (ARS)
    • Evolution Strategies

14

http://rllib.io

RLlib tutorial

15 of 22

General purpose APIs

15

Training in Simulation

Batch RL

Batch Data

Multi-Agent

http://rllib.io

RLlib tutorial

16 of 22

16

Growing number of users

RLlib User Metrics

  • 50 -> 175 GitHub Issues
  • 75 -> 141 Pull Requests
  • 11 -> 55 Dev List Threads

Growing Number of Organizations�using RLlib in Research & Product

http://rllib.io

RLlib tutorial

17 of 22

How to apply RL to solve problems?

17

http://rllib.io

RLlib tutorial

18 of 22

The goal of reinforcement learning

Learn this function through experience

18

http://rllib.io

RLlib tutorial

19 of 22

Step 1: Problem setup

  • Define the environment
    • action space: discrete, vector of floats, tuple, etc.
    • observation space: vector, image, etc.
    • implementation: interface with simulator or system
  • RL algorithm
    • choose from library
  • Policy model
    • neural network architecture

19

http://rllib.io

RLlib tutorial

20 of 22

Step 2: Collecting experiences

20

policy

observation + reward

agent

Action: recommend items A B C

Returns: Pages visited, total time, revenue

http://rllib.io

RLlib tutorial

21 of 22

Step 3: Improving your policy

  • Improve policy network by improving probability of taking actions that lead to good rewards
    • Gradient ascent over estimated reward surface
  • RLlib manages the process of distributed experience collection and policy improvement for you
  • Learn more:

21

http://rllib.io

RLlib tutorial

22 of 22

Tutorial Overview

  • Exercise 1: Markov Decision Processes
  • Exercise 2: Training in simulation with PPO
  • Exercise 3 [optional]: Custom Environments and Reward Shaping

Go to https://github.com/ray-project/tutorial

22

http://rllib.io

RLlib tutorial