1 of 22

Building RL applications with RLlib

Eric Liang

http://rllib.io

RLlib tutorial

2 of 22

This tutorial

Overview of reinforcement learning and RLlib
Focus on applied RL

Understanding how to model problems as RL environments
How to use RLlib for distributed training in simulation

Learn more

Documentation at http://rllib.io

First page has 60-second overview
Links to API, algorithms, examples, etc.

http://rllib.io

RLlib tutorial

3 of 22

RL vs Supervised Learning

Input: cat picture

Output:

it's a cat

http://rllib.io

RLlib tutorial

4 of 22

RL vs Supervised Learning

Users

Service

Search for item

Results

Add item to cart

Suggestions

Checkout

($$$)

http://rllib.io

RLlib tutorial

5 of 22

RL vs Supervised Learning

Users

Service

Search for item

Results

Add item to cart

Suggestions

Checkout

($$$)

Environment

Agent

Observations

Actions

Reward

($$$)

http://rllib.io

RLlib tutorial

6 of 22

Reinforcement Learning is centered around interaction

observation + reward

agent

environment

policy

actions

http://rllib.io

RLlib tutorial

7 of 22

Applications of RL

AlphaGo (2016)

Observations:

board state

Actions:

where to place stone

Rewards:

win / lose

http://rllib.io

RLlib tutorial

8 of 22

Applications of RL

Database query optimization: DQ (2018), Neo (2019)

Observations:

relations joined so far
remaining relations

Actions:

which relations to join

Rewards:

cost of query

http://rllib.io

RLlib tutorial

9 of 22

Applications of RL

Optimizing Data Structures: NeuroCuts (2019)

Observations:

state of current tree node

Actions:

cut or partition current node

Rewards:

depth + size of tree

Train a NeuroCuts

Agent with RL

optimized tree data structure

packet classification rules

deploy artifact

http://rllib.io

RLlib tutorial

10 of 22

RL vs Supervised Learning

Something in common: requires lots of training data
Supervised learning

need labeled data

Reinforcement learning

can generate data through simulations
or, large-scale interactions with the environment

Both require scalable software libraries for training

http://rllib.io

RLlib tutorial

11 of 22

What is RLlib?

Framework for scalable applied reinforcement learning

http://rllib.io

RLlib tutorial

12 of 22

What is RLlib?

http://rllib.io

RLlib tutorial

13 of 22

Unified framework for scalable RL

Evolution

Strategies

(vs Redis-based)

Distributed PPO

(vs OpenMPI)

Ape-X Distributed

DQN, DDPG

http://rllib.io

RLlib tutorial

14 of 22

Broad range of scalable algorithms

High-throughput architectures

Distributed Prioritized Experience Replay (Ape-X)
Importance Weighted Actor-Learner Architecture (IMPALA)

Gradient-based

Advantage Actor-Critic (A2C, A3C)
Deep Deterministic Policy Gradients (DDPG)
Deep Q Networks (DQN, Rainbow)
Proximal Policy Optimization (PPO)
Soft Actor Critic (SAC)

Derivative-free

Augmented Random Search (ARS)
Evolution Strategies

http://rllib.io

RLlib tutorial

15 of 22

General purpose APIs

Training in Simulation

Batch RL

Batch Data

Multi-Agent

http://rllib.io

RLlib tutorial

16 of 22

Growing number of users

RLlib User Metrics

50 -> 175 GitHub Issues
75 -> 141 Pull Requests
11 -> 55 Dev List Threads

Growing Number of Organizations�using RLlib in Research & Product

http://rllib.io

RLlib tutorial

17 of 22

How to apply RL to solve problems?

http://rllib.io

RLlib tutorial

18 of 22

The goal of reinforcement learning

Learn this function through experience

http://rllib.io

RLlib tutorial

19 of 22

Step 1: Problem setup

Define the environment

action space: discrete, vector of floats, tuple, etc.
observation space: vector, image, etc.
implementation: interface with simulator or system

RL algorithm

choose from library

Policy model

neural network architecture

http://rllib.io

RLlib tutorial

20 of 22

Step 2: Collecting experiences

policy

observation + reward

agent

Action: recommend items A B C

Returns: Pages visited, total time, revenue

http://rllib.io

RLlib tutorial

21 of 22

Step 3: Improving your policy

Improve policy network by improving probability of taking actions that lead to good rewards

Gradient ascent over estimated reward surface

RLlib manages the process of distributed experience collection and policy improvement for you
Learn more:

RLlib documentation: http://rllib.io
Deep RL course� http://rail.eecs.berkeley.edu/deeprlcourse/

http://rllib.io

RLlib tutorial

22 of 22

Tutorial Overview

Exercise 1: Markov Decision Processes
Exercise 2: Training in simulation with PPO
Exercise 3 [optional]: Custom Environments and Reward Shaping

Go to https://github.com/ray-project/tutorial

http://rllib.io

RLlib tutorial