Compositional Language Policy
Edwin Zhang, Linar Abdrazakov, Matthew Ho
eddie.win/IGLU22
Task
Objective: Complete long horizon compositional tasks, specified with natural language instruction. Action space must be low level.
https://www.iglu-contest.net/
Method - Multitask Hierarchical Baseline (MHB)
Prompt
Finetuned T5
RL (PPO)
Handcrafted Planner
Voxel grid
Next block
Action
State
“Facing north, build a stack of 4 yellow blocks”
What about prior blocks?
Baseline
Reformulate as text to image
Diffusion Model
https://deeprender.ai/blog/discrete-denoising-diffusion-models
https://huggingface.co/blog/stable_diffusion
Original T5 Model Output
How to improve T5?
Improvement on baseline
Original (T5)
Ours (Imagen 3D)
Precision
37%
58.9%
Recall
49.0%
61.3%
F1
38%
58.5%
Improvement from diffusion model and context prompting
T5
Imagen
Target Grid
Training converges around 230k gradient steps for diffusion model
81.23%
83.1%
81.15%
txt2act Improvement Directions
Final result: 3rd place
Diffusion didn’t work well in the end, stuck with baseline. Need to work on reconstruction algorithm
https://eddie.win/IGLU23