1 of 14

Compositional Language Policy

Edwin Zhang, Linar Abdrazakov, Matthew Ho

eddie.win/IGLU22

2 of 14

Task

Objective: Complete long horizon compositional tasks, specified with natural language instruction. Action space must be low level.

https://www.iglu-contest.net/

3 of 14

Method - Multitask Hierarchical Baseline (MHB)

Prompt

Finetuned T5

RL (PPO)

Handcrafted Planner

Voxel grid

Next block

Action

State

“Facing north, build a stack of 4 yellow blocks”

What about prior blocks?

4 of 14

Baseline

5 of 14

Reformulate as text to image

Prompt

RL (PPO)

Handcrafted Planner

Voxel grid

Next block

Action

State

“Facing north, build a stack of 4 yellow blocks”

Diffusion Model

https://deeprender.ai/blog/discrete-denoising-diffusion-models

https://huggingface.co/blog/stable_diffusion

6 of 14

Original T5 Model Output

7 of 14

Original T5 Model Output

8 of 14

Original T5 Model Output

9 of 14

How to improve T5?

  • Discrete Diffusion

  • Other techniques?
    • Prompting…
    • Use different model…
    • Change token..

10 of 14

Improvement on baseline

Original (T5)

Ours (Imagen 3D)

Precision

37%

58.9%

Recall

49.0%

61.3%

F1

38%

58.5%

11 of 14

Improvement on baseline

Improvement from diffusion model and context prompting

T5

Imagen

Target Grid

Original (T5)

Ours (Imagen 3D)

Precision

37%

58.9%

Recall

49.0%

61.3%

F1

38%

58.5%

12 of 14

Training converges around 230k gradient steps for diffusion model

Original (T5)

Ours (Imagen 3D)

Precision

37%

81.23%

Recall

49.0%

83.1%

F1

38%

81.15%

13 of 14

txt2act Improvement Directions

  • Directly concatenate context as hidden state from diffusion encoder rather than prompting the T5
  • Use relative coordinates
  • Train longer
  • Use finetuned T5 from competition
  • Finetune T5 again with context. Is there way to jointly optimize the objectives? (conditional encoder and denoising objective)

14 of 14

Final result: 3rd place

Diffusion didn’t work well in the end, stuck with baseline. Need to work on reconstruction algorithm

https://eddie.win/IGLU23