1 of 24

ALICE: Aligning Language models with

Interactive Code Execution in game engines

Yang Su

Student Researcher @ Cornell & Millennium

This work is supported by

2 of 24

ALICE

ALICE is …

  • A Meta-Agent Collaboration System

ALICE can …

  • Turns Voice -> Text -> Code -> Interactive 3D Scenes and Objects
  • Generates high-quality Data through Multi-turn Interactions and Feedback

ALICE can be used in …

  • Virtual Reality, Game Engines, Robotics Simulation
  • Complex Development Cycles, Human-Computer Collaboration
  • Data-limited Environments and Business Scenarios

3 of 24

Motivation

ALICE

  • Code generation is hard in virtual environments (robotics simulation, virtual reality, autonomous driving)
    • Number of tools and interaction category is huge as we require precise control over details
    • Lack of efficiency for real-time interaction

4 of 24

Motivation & Contribution

ALICE

  • Code generation is hard in virtual environments (robotics simulation, virtual reality, autonomous driving)
    • Number of tools and interaction category is huge as we require precise control over details
    • Lack of efficiency for real-time interaction
  • We built a Large Language Model (LLM) based system to generate code with
    • Low-cost function calls by only focus on relevant tools
    • Parallelizable code execution for efficient interaction
    • Customizable interaction sets, i.e. user-defined functions

5 of 24

Preliminary Result

User: Select buildings from 6 to 10 meters and make them orange

Executable Code Block

ALICE

6 of 24

Preliminary Result

User: Select buildings from 6 to 10 meters and make them orange

Executable Code Block

ALICE

We call this system Voice2Action

  • The Voice2Action pipeline is built in a Virtual Reality scene for highly efficient code generation
  • Reduced code generation cost by calling LLMs in the traditional way 30x more
    • How efficient?
    • 3k user instructions cost only 5 dollars with OpenAI GPT3.5-Turbo’s API
  • Core concepts includes…
    • Code Interpreter
    • Code Reflection
    • Runtime Compiling via Assembly References
    • Multi-Threading
    • etc.

7 of 24

Voice2Action: code generation in virtual reality

ALICE

8 of 24

Going Forward - The ALICE Project

ALICE

ALICE (Aligning Language models with Interactive Code Execution in game engines)

  • Human instructions are vague and personalized
    • “Draw a building” -> what size? location? shape?
    • Current LLM (autonomous agent) systems often complete the task without seeking user specifications
  • How to “prompt” the user to make them aware of what could be done in the environment?
    • Robotics manipulation -> “Give me an apple” -> there is no apple nearby, do I search further or stop?
    • Just like human vs. human interactions!
  • We are building an efficient multi-agent system in game engines via code generation
    • Why multi-agent - personalization
    • Why game engines - fully controllable environment
    • Why interactive code generation - flexibility & versatility

9 of 24

ALICE - code generation in game engines

Unity Engine

Unreal Engine

ALICE

Urban Planning

Gaming

VR Workspace

10 of 24

Communication

ALICE

Controller LLM

(Object Creator)

  1. Create / edit direct children controllers (objects)
  2. Set up (new) intention communication routes
  3. Call into intentions

Controller LLM 1-1

Controller LLM 1-2

Controller LLM 1-3

Frozen weights

only prompt is modified

Controller LLM 2-1

Controller LLM 2-2

Controller LLM 2-3

old

old

old

new

old

Intent LLM

(Script Creator)

11 of 24

Intent

Intent LLM

(Script Creator)

  • Create inherited scripts (intent LLMs) - MUST ask controller LLM for communication route
  • Set up (new) fields and methods (tools)

Fine-tunable weights

Shared weights per script

i.e. each field / function goes into the same model

Format (Json)

Function Call

Augmented Tool

Documentation

Param Config

Function Call

Field Config

i.e. retrieval systems, vision module, etc.

Execution LLM

  1. No modification of existing fields / functions allowed
  2. Create new fields / functions
  3. Compilation and runtime feedback

ALICE

12 of 24

Interactive Code Execution

User: Build

the Grand Canyon in

Arizona, United States

ALICE

User: 建造崇山峻岭

(Chinese)

“Build many mountains with different styles”

N / A

(Through Multi-Turn Conversation)

Ground Truth

(not used in training)

13 of 24

Before vs. After Alignment

ALICE

14 of 24

Controller LLM

(Agent Creator)

Environment Configuration (Long-Term Memory)

User: “draw a vibrant outdoor scene…”

“Sunny weather”

[Parallelized^2]

E::Render()

Documentation

Execution LLM

On Script E

Format (Json)

Function Call

Execution Examples

(Short-Term Memory)

f0

f2

f3

Pass

Fail

Error Trace

“In a sunny day..” [Parallelized]

Intent LLM

On Script E

f1

Augmented Tool

Documentation

Param Config

Function Call

LVM for Feedback

Human vs. GPT4-V

Agreement

Script A

(terrain)

Script E

(skybox)

E::Paint

(tree type)

E::Render

(weather type)

Intent LLM

(Script Creator)

“No tree..”

-> N/A

“Sunny..”

-> Execute

communicate

. . .

Execution LLM

On Script E

Field Config

communicate

. . .

“A riverside village..”

“In a sunny day..”

Frame updates

15 of 24

Collaborations

ALICE

Past Collaborations

Current Collaborations

  • Cornell NLP
  • Millennium - Cornell ICPC Initiative

Future Collaborations

  • BigCode
  • Alibaba Qwen Team

16 of 24

Conclusion & Broader Impact

ALICE

Why ALICE is useful?

  • Data generation with active intervention (wrt. to SORA)
  • Data generation in data limited environments (virtual reality, robotics simulation, autonomous driving, etc.)

Advantages of ALICE?

  • Efficiency, cost-friendly and personalization
  • Orthogonal to popular agent strategies for each LLM component!
    • Chain-of-Thought, Tree-of-Thought (ToT)
    • ReAct, Reflexion
    • AutoGPT, MetaGPT
  • Easy to augment tool to primary functions
    • RAG (Retrieval-Augmented Generation)
    • Skill Library (Built-In Support!)
  • Online Self-Improving
  • Performance Scaling wrt. LLM Improvements

Limitations of ALICE?

  • Expensive Evaluation
  • Only tested in game engines
    • Might have drawbacks in other environments
    • As we depend on game components (script-based OOP) so far
  • (the unforeseen…)

17 of 24

Team Members

ALICE

Yang Su (ICPC)

ys724@cornell.edu

Research Engineer

ALICE & Voice2Action Lead

Cheng Fei (ICPC)

cf482@cornell.edu

ML Engineer

ALICE - Feedback Strategy

Rohan Narayan

rn334@cornell.edu

ML Engineer

ALICE - Agent Strategy

Shangyi Geng

sg2323@cornell.edu

Research Engineer

ALICE - Co-Lead

Shou-Kai Cheng

sc2745@cornell.edu

Biz dev & ML Engineer

ALICE - Versatile Engineer

Joy Wang

zw673@cornell.edu

ML Engineer

ALICE - Fine Tuning

Young He

jh2795@cornell.edu

ML Engineer

ALICE - Deployment

18 of 24

Team Members

ALICE

Grace Nho

en268@cornell.edu

VR & UX Designer

Voice2Action - Interaction

Jingze Xue

jx288@cornell.edu

Game & System Designer

Voice2Action - Interaction

Xinyue Cao

xc426@cornell.edu

ML Engineer

ALICE - Evaluation

Yunwei Zhao

yz2959@cornell.edu

ML Engineer

ALICE - Evaluation

19 of 24

Thank you!

Link to the ALICE Project: alicellm.github.io

  • Technical Report (In Progress)

Link to the Voice2Action Project

  • Technical Report (Arxiv)
  • Unity Package

Scan the QR Code for link to this presentation!

Connect with Us

ALICE

20 of 24

Acknowledgement

Datasets

  • Unreal engine data is from Palatial XR
  • Other data are publicly available (GitHub, Online Assets, etc.)

We thank the following advisors for their advice and suggestions throughout this work

We thank the following organizations for supporting this work

ALICE

21 of 24

Ex. 1

22 of 24

Ex. 2

23 of 24

Ex.3

Our Customer Palatial XR

24 of 24

Ex.3 Video