1 of 18

Evaluating Artificial Social Intelligence �in an �Urban Search and Rescue �Task Environment

AAAI Fall Symposium Series:�Theory of Mind for Teams

4-5 Nov 2021

�Jared Freeman1, Lixiao Huang2, Matt Wood1, Stephen J. Cauffman2

Aptima Inc.1, Arizona State University2

� freeman@aptima.com, lixiao.huang@asu.edu, mwood@aptima.com, scauffma@asu.edu

This material is based upon work supported by the Defense Advanced Research Projects Agency (DARPA) under Contract No. HR001119C0130. Any opinions, findings and conclusions or recommendations expressed in this material are those of the author(s) and do not necessarily reflect the views of the Defense Advanced Research Projects Agency.

2 of 18

Overview

  • Training, Talk, Technology, and Theory of Mind
  • The ASIST Team Task Environment
  • Experimental Design
  • Artificial Social Intelligence
  • ASI Evaluation & Findings
  • Future Research

3 of 18

Training, Talk, Technology, and Theory of Mind

  • Teams coordinate through Training, Talk, Tech, ToM
  • ToM
    • Builds from Training, Talk, & Technology…
      • The better the TTT, the better ToM
    • Compensates for deficiencies of TT&T…
      • The worse the TTT, the more critical the ToM
    • ToM is inevitable, errorful

Training

Talk

Technology

ToM

4 of 18

Training, Talk, Technology, and Theory of Mind

  • Human-built ToM
    • Infer the cognitive and affective state of others, their goals, and their needs
    • Predict others’ actions
    • ...to develop guidance and actions that coordinate work.
    • ToM inference is ineluctable and errorful.
  • Machine-built ToM
    • of artificial agents (Rabinowitz, et al., 2018)
    • of humans & human teams (ASIST)

5 of 18

DARPA ASIST

Artificial Social Intelligence �for Successful Teams

  • Objective
    • Model teams members (MToM) and teams (MToT) well enough to offer reliably useful advice
    • Important when when the necessity and difficulty of human-built ToM are high because:
      • members are highly varied in their capabilities and capacity
      • task synchronization is complex
      • risk of failure at high stakes
      • preparation and information are incomplete
  • Interdisciplinary
    • Six teams of social scientists create theory and analytic agents
    • Six teams of computer scientists create ASI agents
    • One evaluation team (Aptima+ASU+COS)
  • Large: 320 individuals from 31 organizations
  • Long: Fall 2019 - Summer 2024

6 of 18

ASIST & �Team Tasks

  • Objective: Create a task environment in which
    • Human team members must coordinate well to succeed
    • ASI must formulate MToM to make inferences & predictions
    • ASI generate interventions to improve coordination (in 2022)
    • The accuracy and utility of ASI can be evaluated
  • Characteristics of team tasks
    • Skills & roles are distinct
    • Synchronization benefits performance
    • Risk & reward trade off
    • Planning matters
    • Information is imperfect
    • Communication is necessary

7 of 18

The ASIST USAR �Team Task Environment

Bird’s-eye view of three participants

  • Skills & roles: Three members in swappable roles
  • Reward: Rescue critical victims vs. regular victims (high vs. low reward)
  • Task synchronization: Async, sequential, and simultaneous teamwork
  • Communication: Audio & marker blocks for comms
  • Risk: Hidden freeze plate in rooms

Zoom

Picture in Picture

1st person view

Building layout and player locations

8 of 18

The ASIST Task�from the Participant’s View

  1. Map -- Shared, unique, & missing info about locations of victims and rubble No mark of team member locations.
  2. Marker block legend -- Identical legend for 2 participants. No Victim & Regular Victim swapped for 1.
  3. Gamespace -- First-person view of self (arm), space, time, victim count by type, score (RV+n*CV)

1

2

3

Info Map

Marker block legend

Minecraft world

9 of 18

The ASIST �Experimental Design

Trial Maps (within-group):

SaturnA and SaturnB

Trial 1

Trial 2

Shared mental model manipulation (between-group)

Condition 1: Team planning

32 teams

Condition 2: No planning (math control task)

32 teams

Sample data for study 3

*No planning + human advisor

4 teams

10 of 18

Experimental Design

  • Participants (all remote)
    • 192 participants in 64 teams from Reddit, Discord, ASU
    • 141 males, 49 females, and 2 other or no response
    • Mean age 22.04 (SD=5.22, ranging from 18 to 49)
    • Ethnicities were white/Caucasian (54.2%; 104), Asian (25.8%; 49), and Hispanic or Latino (13%; 25)
    • All participants had at least a high school level education
    • All claimed Minecraft expertise, which we tested
  • Procedure (3.5hrs)
    • Software installation & Surveys (60min)
    • Consent, slides training, hands-on practice, competency test, trials, multiple surveys (150min)
  • Data collection
    • Surveys: 469 items re: 22 constructs
    • Testbed messages re: team, trial, experimental conditions, events
    • Human observer measurements on test trials (only)
    • 2079 files, 280GB

11 of 18

Artificial Social Intelligence

  • University of Arizona -- Dynamic Bayes networks (DBNs) model individuals & teams from behavior, NLP, & speech acts
  • SIFT -- MC Tree Search over learnable action grammars
  • University of Southern California -- Recursive POMDPs constructed using RDDL domain rep + perturbations. Bayesian inference to update beliefs
  • DOLL/MIT -- Narratives from stories, inverse planning, probabilistic ToM, probabilistic conditional preference, story understanding (Genesis), and learned player capability.
  • Carnegie Mellon University -- Modular neural network models individual, introspection resolves deviations between predicted and observed behaviors.
  • Charles River Analytics -- Probabilistic programming to model goals & states. Strategic Coach selects interventions.

12 of 18

ASI Evaluation & Findings

  • Human observers and ASI agents evaluated on the same four metrics relevant to MToM
  • Few (4) evaluation metrics ensured that ASI developers had spare capacity to create and test alternative MToM capabilities.

Metric ID: Function 

ASI Agent & Human Observer �infer / predict

Measure

M1: Prediction of effects of future interventions

Team score (3x per trial at fixed times)

Normalized RMSE

M3: Inference of member mental model / knowledge

Given map information (3x)

Mean accuracy

M6: Inference of member mental model / knowledge (conflicting knowledge)

Given marker block meanings (3x)

Mean accuracy

M7: Prediction of action given member beliefs (Sally-Anne)

Room entry in response to another participant’s marker block (many per trial)

Mean accuracy

13 of 18

ASI Evaluation & Findings

Findings

  • All agents performed above chance
  • Humans outperformed ASI agents

Accuracy of ASI agents (yellow) & human observers (blue) on four tests of social intelligence.

14 of 18

ASI Evaluation & Findings

Findings

  • No one ASI agent reliably outperformed the others

Accuracy of human observers (triangle) and artificial agents (circles) on four tests of social intelligence.

15 of 18

ASI Evaluation & Findings

Finding

  • Accuracy of ASI agents and human observers improved rapidly though modestly during the brief trials.

Accuracy predicting final score (M1), 3x per trial

Percent accuracy for inferring marker block semantics (M6), 3x per trial

16 of 18

Future Research

Support the claim that

With quantitative measurements of

Social science constructs drive

Analytic agent use, influence, effect

Design of ASI MToM/T to enable

MToM/T Existence, Inference, Prediction

ASI interventions on

Intervention (non)existence, Compliance, Explanations, Perceived Utility of ASI, Trust in ASI

Team process that improve

Synchronization, Error Reduction, Resilience, Coordinative Comms

Mission effects

Mission score (weighted to team tasks)

17 of 18

Goal

  • Technology: MToM / MToT
  • Talks to human team members
  • In Training and missions
  • To
    • improve coordination & mission outcomes
    • enhance the accuracy of human ToM

Training

Talk

Technology

ToM

18 of 18

Acknowledgement

Contact:

Jared Freeman <freeman@aptima.com>