Building RL applications with RLlib
Eric Liang
1
http://rllib.io
RLlib tutorial
This tutorial
2
http://rllib.io
RLlib tutorial
RL vs Supervised Learning
3
Input: cat picture
Output:
it's a cat
http://rllib.io
RLlib tutorial
RL vs Supervised Learning
4
Users
Service
Search for item
Results
Add item to cart
Suggestions
Checkout
($$$)
http://rllib.io
RLlib tutorial
RL vs Supervised Learning
5
Users
Service
Search for item
Results
Add item to cart
Suggestions
Checkout
($$$)
Environment
Agent
Observations
Observations
Actions
Actions
Reward
($$$)
http://rllib.io
RLlib tutorial
Reinforcement Learning is centered around interaction
6
observation + reward
agent
environment
policy
actions
http://rllib.io
RLlib tutorial
Applications of RL
AlphaGo (2016)
7
http://rllib.io
RLlib tutorial
Applications of RL
Database query optimization: DQ (2018), Neo (2019)
8
http://rllib.io
RLlib tutorial
Applications of RL
Optimizing Data Structures: NeuroCuts (2019)
9
Train a NeuroCuts
Agent with RL
optimized tree data structure
packet classification rules
deploy artifact
http://rllib.io
RLlib tutorial
RL vs Supervised Learning
10
http://rllib.io
RLlib tutorial
What is RLlib?
11
Framework for scalable applied reinforcement learning
http://rllib.io
RLlib tutorial
What is RLlib?
12
http://rllib.io
RLlib tutorial
Unified framework for scalable RL
13
Evolution
Strategies
(vs Redis-based)
Distributed PPO
(vs OpenMPI)
Ape-X Distributed
DQN, DDPG
http://rllib.io
RLlib tutorial
Broad range of scalable algorithms
14
http://rllib.io
RLlib tutorial
General purpose APIs
15
Training in Simulation
Batch RL
Batch Data
Multi-Agent
http://rllib.io
RLlib tutorial
16
Growing number of users
RLlib User Metrics
Growing Number of Organizations�using RLlib in Research & Product
http://rllib.io
RLlib tutorial
How to apply RL to solve problems?
17
http://rllib.io
RLlib tutorial
The goal of reinforcement learning
Learn this function through experience
18
http://rllib.io
RLlib tutorial
Step 1: Problem setup
19
http://rllib.io
RLlib tutorial
Step 2: Collecting experiences
20
policy
observation + reward
agent
Action: recommend items A B C
Returns: Pages visited, total time, revenue
http://rllib.io
RLlib tutorial
Step 3: Improving your policy
21
http://rllib.io
RLlib tutorial
Tutorial Overview
Go to https://github.com/ray-project/tutorial
22
http://rllib.io
RLlib tutorial