1 of 22

Interactive Learning with Grounded Language Agents Utilizing World Models

Arjun V Sudhakar*^1,3

Sai Rajeswar* ²

1 - Mila - Quebec AI Institute

2 - ServiceNow Research

3 - Polytechnique Montreal

2 of 22

Goal

Performance and Data efficient training, using Model based RL

Leveraging past information

Effective planning

4 of 22

Background:

Yao et al., 2022, NeurIPS

5 of 22

Background:

Yao et al., 2022, NeurIPS

6 of 22

Background:

Yao et al., 2022, NeurIPS

9 of 22

Motivation

Recognize user’s intent towards solving a task by LM+RL described by the user. [Osborne et al. 2022]

Grounded models can enhance an agent’s generalization, scope, and sample efficiency [Wang et al. 2018]

Developing an agent that can learn, discover, and adapt efficiently is a promising approach to identifying the user’s intent and completing the tas [Ha and Schmidhuber, 2018]

10 of 22

Research Question

Does the current system uses visual information for task solving?

Leveraging visual cues.

11 of 22

Experimental Setup

The Hypothesis of Experiment 1 : Try to understand what modality of information the system is majorly relying on to solve the problem

Setup 1: Image only
Setup 2: Image+Language

12 of 22

Experimental Setup

The Hypothesis of Experiment 1 : Try to understand what modality of information the system is majorly relying on to solve the problem

Setup 1: Image only
Setup 2: Image+Language

Understanding/Learning:

Can we leverage the information from the image to better solve the problem? Like using some vision+language models.
Also, how do the current visual embeddings are understood by the language encoders?

13 of 22

Research Question

Does the current system uses visual information for task solving?

Leveraging visual cues.

How well the system uses language information?

syntactic and semantic

14 of 22

Experimental Setup

Note: 5 random seed

The Hypothesis of Experiment 2:

Shuffling the instructions randomly deteriorates the performances?

Can the model still preserve the results irrespective of the syntactic or semantic information?

Setup 1: Shuffle the words in the instructions randomly and feed that information to the language model

15 of 22

Experimental Setup

Note: 5 random seed

The Hypothesis of Experiment 2:

Shuffling the instructions randomly deteriorates the performances?

Can the model still preserve the results irrespective of the syntactic or semantic information?

Setup 1: Shuffle the words in the instructions randomly and feed that information to the language model

Outcome:

This helps us to ask a further question like, do we need language models to propose actions and choose action?
What is the role of the retrieval system?
Can the retrieval system fetch us better action searches than a large language model?

16 of 22

Research Question

Does the current system uses visual information for task solving?

Leveraging visual cues.

How well the system uses language information?

syntactic and semantic

Learning better join representation?

17 of 22

Experimental Setup

The Hypothesis of Experiment 3: Rather than just concatenating language and vision representations, can we use FiLM to concatenate the representation?

Setup: Does the vanilla concat bottleneck of performance improvement?

18 of 22

Experimental Setup

The Hypothesis of Experiment 3: Rather than just concatenating language and vision representations, can we use FiLM to concatenate the representation?

Setup: Does the vanilla concat stop the performance improvement?

Outcome: Improvement in performance will suggest that some sophisticated method can be used, or else we can focus on other methods for improvement.

19 of 22

Research Question

Does the current system uses visual information for task solving?

Leveraging visual cues.

How well the system uses language information?

syntactic and semantic

Learning better join representation?

FiLM

Using Model based RL

Planning, history of information

20 of 22

Experimental Setup

The Hypothesis of Experiment 4: Using a model-based approach to utilize historical data better and plan to yield a better result.

Setup: A model-based algorithm baseline using DDPG, SAC

21 of 22

Experimental Setup

The Hypothesis of Experiment 4: Using a model-based approach to utilize historical data better and plan to yield a better result.

Setup: A model-based algorithm baseline using DDPG, SAC

Outcome: The model-based will improve the existing baseline, or there will be a drop in performance due to hyperparameter sensitivity or any other factors in modeling the world.

22 of 22

Timeline

Code Setup
Establish the baseline
Start the experiment reproducibility
Have the code, for first 2 research questions
Start to dive into deep research question
Implement the model based RL
Improve the baseline result