Using Language Models To Convert Between Natural Language and Game Commands
Dataset
Abstract
Dungeons and Dragons is a popular tabletop role-playing game that has been adapted to online play. In this paper, we look at enhancing a Discord Bot called Avrae that is developed by DnD Beyond to help with online play. Avrae enables users to manage gameplay through Unix-like commands. We explore using language models to automatically translate player dialogue into Avrae's commands. We use GPT-3's few shot learning and fine tuning capabilities and achieve 64% accuracy. We also explore the reverse direction, where commands are rendered as descriptive text, suggesting that it may eventually be possible to combine Avrae and LMs to create a system that is capable of role playing alongside players.
Story Generation
Limitations
Translation Performance
Avrae Bot
Command Translation
Generation Performance
Stefan Papazov, Wesley Gill, Marta Garcia Ferreiro, Andrew Zhu, Lara J. Martin, Chris Callison-Burch
{spapazov, wesgill, martagf, laramar, ccb}@seas.upenn.edu
University of Pennsylvania, Avrae
We algorithmically generate prompts and use them to perform command translation in a few shot setting.
GPT-3 Few Shot
Prompt Generator
Off The Shelf GPT-3
Command Translation
GPT-3 Fine Tune
We fine tune Davinci, OpenAI's largest GPT-3 model, and use it for command translation in a few shot setting
Prompt Generator
Fine Tuned GPT-3
Command Translation
Task Decomposition
We break down command translation into three subproblems
Classify Command Type
Translate Attack/Cast
Translate Check/Save
Prompt Generator
Off The Shelf GPT-3
We investigated if Avrae commands can be used as input to large language models to generate text as an aid for Dungeons and Dragons storytelling.
!attack bow -t g1
!cast magic missiles -t g2
“Fjolnir shoots an arrow at goblin 1; Dirk turns to goblin 2 and yells magic missiles striking the slimy booger!”
To obtain a baseline, we considered the best performing model in the IBM NLC2CMD competition, an ensemble of five separately trained Transformers with Beam Search, and re-trained it on our Play-By-Post data.
Story Generation Evaluation
We use three criteria to evaluate the quality of a DM summary:
Cohesion (whether the output is logical)
Interestingness (how engaging the output is for a reader as a fictional story)
Relevance (how closely the output pertains the input commands)