1 of 22

Interactive Learning with Grounded Language Agents Utilizing World Models

Arjun V Sudhakar* 1,3

Sai Rajeswar* 2

1 - Mila - Quebec AI Institute

2 - ServiceNow Research

3 - Polytechnique Montreal

2 of 22

Goal

  • Performance and Data efficient training, using Model based RL

    • Leveraging past information

    • Effective planning

3 of 22

Background

4 of 22

Background:

Yao et al., 2022, NeurIPS

5 of 22

Background:

Yao et al., 2022, NeurIPS

6 of 22

Background:

Yao et al., 2022, NeurIPS

7 of 22

Background:

8 of 22

Motivation

9 of 22

Motivation

  • Recognize user’s intent towards solving a task by LM+RL described by the user. [Osborne et al. 2022]

  • Grounded models can enhance an agent’s generalization, scope, and sample efficiency [Wang et al. 2018]

  • Developing an agent that can learn, discover, and adapt efficiently is a promising approach to identifying the user’s intent and completing the tas [Ha and Schmidhuber, 2018]

10 of 22

Research Question

  • Does the current system uses visual information for task solving?
    • Leveraging visual cues.

11 of 22

Experimental Setup

The Hypothesis of Experiment 1 : Try to understand what modality of information the system is majorly relying on to solve the problem

  • Setup 1: Image only
  • Setup 2: Image+Language

12 of 22

Experimental Setup

The Hypothesis of Experiment 1 : Try to understand what modality of information the system is majorly relying on to solve the problem

  • Setup 1: Image only
  • Setup 2: Image+Language

Understanding/Learning:

  • Can we leverage the information from the image to better solve the problem? Like using some vision+language models.
  • Also, how do the current visual embeddings are understood by the language encoders?

13 of 22

Research Question

  • Does the current system uses visual information for task solving?
    • Leveraging visual cues.
  • How well the system uses language information?
    • syntactic and semantic

14 of 22

Experimental Setup

Note: 5 random seed

The Hypothesis of Experiment 2:

Shuffling the instructions randomly deteriorates the performances?

Can the model still preserve the results irrespective of the syntactic or semantic information?

  • Setup 1: Shuffle the words in the instructions randomly and feed that information to the language model

15 of 22

Experimental Setup

Note: 5 random seed

The Hypothesis of Experiment 2:

Shuffling the instructions randomly deteriorates the performances?

Can the model still preserve the results irrespective of the syntactic or semantic information?

  • Setup 1: Shuffle the words in the instructions randomly and feed that information to the language model

Outcome:

  • This helps us to ask a further question like, do we need language models to propose actions and choose action?
  • What is the role of the retrieval system?
  • Can the retrieval system fetch us better action searches than a large language model?

16 of 22

Research Question

  • Does the current system uses visual information for task solving?
    • Leveraging visual cues.
  • How well the system uses language information?
    • syntactic and semantic
  • Learning better join representation?

17 of 22

Experimental Setup

The Hypothesis of Experiment 3: Rather than just concatenating language and vision representations, can we use FiLM to concatenate the representation?

  • Setup: Does the vanilla concat bottleneck of performance improvement?

18 of 22

Experimental Setup

The Hypothesis of Experiment 3: Rather than just concatenating language and vision representations, can we use FiLM to concatenate the representation?

  • Setup: Does the vanilla concat stop the performance improvement?

Outcome: Improvement in performance will suggest that some sophisticated method can be used, or else we can focus on other methods for improvement.

19 of 22

Research Question

  • Does the current system uses visual information for task solving?
    • Leveraging visual cues.
  • How well the system uses language information?
    • syntactic and semantic
  • Learning better join representation?
    • FiLM
  • Using Model based RL
    • Planning, history of information

20 of 22

Experimental Setup

The Hypothesis of Experiment 4: Using a model-based approach to utilize historical data better and plan to yield a better result.

  • Setup: A model-based algorithm baseline using DDPG, SAC

21 of 22

Experimental Setup

The Hypothesis of Experiment 4: Using a model-based approach to utilize historical data better and plan to yield a better result.

  • Setup: A model-based algorithm baseline using DDPG, SAC

Outcome: The model-based will improve the existing baseline, or there will be a drop in performance due to hyperparameter sensitivity or any other factors in modeling the world.

22 of 22

Timeline

  • Code Setup
  • Establish the baseline
  • Start the experiment reproducibility
  • Have the code, for first 2 research questions
  • Start to dive into deep research question
  • Implement the model based RL
  • Improve the baseline result