1 of 12

OpenAssistant

Vision & Roadmap

2 of 12

OpenAssistant is a chat-based assistant that understands tasks, can interact with third-party systems, and retrieve information dynamically to do so.

It can be extended and personalized easily and is developed as free, open-source software.

3 of 12

OpenAssistant unifies all knowledge work in one place

  • Uses modern deep learning
  • Runs on consumer hardware
  • Trains on human feedback
  • Free and open

Retrieval via Search Engines

External, upgradeable knowledge: No need for billions of parameters.

Your Conversational Assistant

State-of-the-Art chat assistant that can be personalized to your needs

Interfacing w/ external systems

Usage of APIs and third-party applications, described via language & demonstrations.

A building block for developers

Integrate OpenAssistant into your application.

4 of 12

Our Vision

We want OpenAssistant to be the single, unifying platform that all other systems use to interface with humans.

5 of 12

Our Roadmap

Growing Up

  • Retrieval Augmentation
  • Rapid Personalization
  • Using External Tools

Q1 2023

Minimum Viable Prototype

  • Data Collection Pipeline
  • RL on Human Feedback
  • Assistant v1 usable
  • Out January 2023!

ASAP

Growing Out

  • Third-Party Extensions
  • Device Control
  • Multi-Modality

Q2 2023

How did we get here?

  • What do you need?

6 of 12

Getting to MVP

We follow InstructGPT

7 of 12

Source: InstructGPT

8 of 12

  1. Supervised Fine-Tuning on Human Demonstrations
  • We need to collect (human) demonstrations of assistant interactions
    • Read our Data Structures Overview to see how
    • We estimate about 50k* demonstrations
  • Fine-tuning a base model on the collected data
    • Candidates: GPT-J, CodeGen(surprisingly promising), FlanT5, GPT-JT
    • Can use pseudo-data (e.g. from QA dataset) before we have the real data
  • Additionally, collect instruction datasets
    • Quora, StackOverflow, appropriate subreddits, …
    • Training an "instruction detector" would allow us to e.g. filter Twitter for good data

* InstructGPT has 13k, 33k, and 31k samples for the three steps, respectively

9 of 12

2) Training a Reward Model & RLHF

  • We need to collect rankings of interactions
    • Again, read our Data Structures Overview to see how
  • Reward Model Training could also use Active Learning
    • Keeps humans in the loop
    • Drastically decreases needed data
  • Reinforcement Learning against the Reward Model
    • Follow InstructGPT and use PPO

10 of 12

Main Efforts

  • Data Collection Code → Backend, website, and discord bot to collect data
  • Instruction Dataset Gathering → Scraping & cleaning web data
  • Gamification → Leaderboards & more, to make data collection more fun
  • Model Training → Experiments on pseudo- and real-data
  • Infrastructure → Collection, training, and inference
  • Data Collection → This is the bulk of the work
  • Data Augmentation → Making more data from little data
  • Privacy and Safety → Protecting sensitive data

11 of 12

Principles

  • We put the human in the center
  • We need to get the MVP out fast, while we still have momentum
  • We pull in one direction
  • We are pragmatic
  • We aim for models that can (or could, with some effort) be run on consumer hardware
  • We rapidly validate our ML experiments on a small scale, before going to a supercluster

12 of 12

Where to go from here?