1 of 1

Using Language Models To Convert Between Natural Language and Game Commands

Dataset

Abstract

Dungeons and Dragons is a popular tabletop role-playing game that has been adapted to online play. In this paper, we look at enhancing a Discord Bot called Avrae that is developed by DnD Beyond to help with online play. Avrae enables users to manage gameplay through Unix-like commands. We explore using language models to automatically translate player dialogue into Avrae's commands. We use GPT-3's few shot learning and fine tuning capabilities and achieve 64% accuracy. We also explore the reverse direction, where commands are rendered as descriptive text, suggesting that it may eventually be possible to combine Avrae and LMs to create a system that is capable of role playing alongside players.

Story Generation

Limitations

Translation Performance

Avrae Bot

Command Translation

Generation Performance

Stefan Papazov, Wesley Gill, Marta Garcia Ferreiro, Andrew Zhu, Lara J. Martin, Chris Callison-Burch

{spapazov, wesgill, martagf, laramar, ccb}@seas.upenn.edu

University of Pennsylvania, Avrae

  • Avrae is a Discord bot that allows players and DMs to perform dice rolls through a command API

  • This helps to simplify some of the rule-based aspects of DnD gameplay, like tracking combat.
  • We manually annotated a set of game transcripts with their corresponding Avrae commands.

  • The game transcripts were collected from a Play-By-Post forum on DnD Beyond that did not use Avrae.

We algorithmically generate prompts and use them to perform command translation in a few shot setting.

GPT-3 Few Shot

Prompt Generator

Off The Shelf GPT-3

Command Translation

GPT-3 Fine Tune

We fine tune Davinci, OpenAI's largest GPT-3 model, and use it for command translation in a few shot setting

Prompt Generator

Fine Tuned GPT-3

Command Translation

Task Decomposition

We break down command translation into three subproblems

Classify Command Type

Translate Attack/Cast

Translate Check/Save

Prompt Generator

Off The Shelf GPT-3

We investigated if Avrae commands can be used as input to large language models to generate text as an aid for Dungeons and Dragons storytelling.

!attack bow -t g1

!cast magic missiles -t g2

“Fjolnir shoots an arrow at goblin 1; Dirk turns to goblin 2 and yells magic missiles striking the slimy booger!”

To obtain a baseline, we considered the best performing model in the IBM NLC2CMD competition, an ensemble of five separately trained Transformers with Beam Search, and re-trained it on our Play-By-Post data.

Story Generation Evaluation

We use three criteria to evaluate the quality of a DM summary:

Cohesion (whether the output is logical)

Interestingness (how engaging the output is for a reader as a fictional story)

Relevance (how closely the output pertains the input commands)

  1. Small Data Set: Our dataset has, 500 annotated turns - fine-tuned models may benefit from larger numbers of annotations

  • Manual Annotation: Turns were manually annotated post-hoc. It would be better to instrument Avrae to capture commands (in progress)

  • Syntax Checks: Our models do not currently check whether the command is a valid command in the Avrae system.

  • Non Expert Evaluators: MTurk ratings might not be that reliable. To have our task completed cheaply and quickly, we did not require that raters have played or DMed