1 of 17

Assessing novel C2 socio-technical AI enabled planning concepts: �What we learnt using the Wizard of Oz and other evaluation methods��Whether to WoZ or not to WoZ

Simon Attfield, Andrew Leggatt, Paddy Turner, Holly Roberts,

Rachel Asquith & Richard Ellis

2 of 17

Related Machine-Speed C2 briefs today

2

3 of 17

Outline of paper

Introduction – ‘the Why’
Key concepts – ‘what we needed to take account of’
Evaluation methods – Expert review, Lab based, Field based
Evaluation of two planning concepts using Wizard of Oz method – Auto Piggery and Hey Socrates-
Discussion – ‘what worked and why’

3

4 of 17

Introduction

Evaluation methods used to assess conceptual designs for military Operational level planning in a multi-disciplinary project
Evaluation – cornerstone of research and development
Investigation of the constraints in a design

Needs of the situation that require satisfying (e.g. goals, functions etc.)
Opportunities for change (e.g. technology, process, ways of working)

4

Needs

Opportunities

Innovation

5 of 17

Evaluation enables…

Designers to see how well the solution addresses the needs
Enables insight into how the solution might perform in a context
How the design might do it better (improvements)
Understanding the context of use

5

6 of 17

Key concepts

Socio-technical systems - Performance of a system depends upon technology design AND social context of use.
Formative vs summative aims - How could it be better vs How good is the system?
Low and high design maturity - Progression from low to high maturity design. Design representations embody commitments and enable evaluation. Not preserved in aspic.
Evidence quality vs ecological validity – Typically a trade-off.

6

7 of 17

User-centred evaluation methods (a taxonomy)

Expert Review

Heuristic Evaluation

Experts review system or interface against established principles or heuristics (e.g. Nielsen, 1992).

Cognitive Walkthrough

Step-by-step task simulation with predefined questions. Evaluates ‘walk-up-and-use’ learnability (Wharton, C., Rieman, J., Lewis, C., & Polson, P., 1994)

Feature Inspection

Examination of specific features rather than the overall user interface or experience. Determines how features support user-tasks, integration and usability.

7

Lab-based methods

Pluralistic Walkthrough (participatory design review, storyboarding)

Facilitated walkthrough with stakeholders Based on a typical task flow.

User Testing (usability testing)

Users perform tasks using system. Interaction observed and recorded.

Post-task questionnaire or interview.

Wizard of Oz

Wizard of Oz is a user testing method proposed for AI because they are difficult to prototype. Responses created by human (Wizard).

Field based methods

Naturalistic Observation

Observing participants in natural environment without interference by researcher.

Interviews

Insights into people’s thoughts, experiences, perceptions, and motivations (not observable).

Contextual Interview

Researcher in natural working environment, observe and engage directly with participants as they perform their routine tasks.

8 of 17

Evaluation of two military planning concepts using �Wizard of OZ method.

Developed Concept Solutions for enhancing Military Operational-level Planning.
A Concept Solution is a conceptual design (i.e. high-level, early phase design description)
We discuss two concept evaluations:

8

Auto-piggery

Helps planners understanding of stakeholders within an operational environment.

Hey Socrates

Monitors ongoing planning process, and offers questions and prompts to highlight possible omissions or deviations from ‘good’ planning practice.

9 of 17

Auto-piggery

Interactive visualisation tool supported by AI. User provides operational objective (critical issue). System displays information about potential interested parties. The user explores.
Initial details of design were developed by two expert military planners and an HCI researcher by collaboratively drafting an interaction scenario (Carroll, 1997).

“Jane is a J5 planner and Secretary of the Joint Task Force Mission Analysis Group. The Joint Task Force Commander (JTFC) has been given a mission to deploy a force into <country x> as part of a coalition acting under an international mandate. The JTFC has a specified task to secure the city of <city y> and surrounding region and subsequently secure the southern flank. Internally, there are a number of threats and the JTFC needs to consider who the main players are to set up the Mission Analysis discussion.”

9

10 of 17

Auto-piggery wireframe

10

An interactive design representation for exploring the problem.

Brings to concept solution to life.

Each page represented an information display focusing on a given stakeholder group.

Interactivity simulated using linked objects to force page transitions.

11 of 17

Auto-piggery mock-up

Azulindi Scenario content created using ChatGPT.
‘Canned’ interactive visualisations to mimic dynamic system.
Multiple files were created each representing a class of stakeholder in the scenario (e.g. Military Insurgency Organisations, Religious Organisations, NGOs).

11

12 of 17

Auto Piggery evaluation (Wizard of Oz)

Conducted over recorded Teams meeting.
Two planners who could communicate verbally over call.
Wizard played by a researcher.
Interaction with Wizard over chat.
Planners could request visualisations for given stakeholder types and take control of visualisation interaction.

12

13 of 17

Hey Socrates

A system that gathers information from interactions between planners, analytical tools that they use and artefacts they create.
Draws inferences about how the planning activity relates to a normative model.
Identifies deviations and loose ends as the basis for user questions and prompts.

13

14 of 17

Hey Socrates evaluation (Wizard of Oz)

Users given task to conduct ‘Centre of Gravity’ (CoG) analysis for operational planning scenario.
CoG Identifies critical aspect of an actor’s capabilities or source of power. Can be physical or abstract. This then provides focus of ‘key’ for operational planning.
Two-hour task. Wizards monitored conversation and, given a set of guidelines, periodically provided questions through MS teams chat to prompt planners reflections on the process.

14

Guidelines

15 of 17

Data Recording and Analysis

Both sessions video and chat logs recorded, concurrent observation.
Follow-up debrief sessions to explore planners’ experiences.
Data analysed for…

Benefits of concept solution.
Level of change for C2.
Feasibility of AI implementation.
Data requirements.
Appropriateness of method.

15

16 of 17

Discussion

16

	Auto Piggery	Hey Socrates
Interaction modality	linguistic and visual (agent plus artefact)	linguistic (agent only)
Anthropomorphic?	less so	yes
Naturalistic interaction?	less so	yes
Critical path?	critical path	optional
Materials preparation	mock-ups with canned content	guidelines
Preparation	high	medium
Prior design commitment	high (evaluating more)	low
Interaction constraints	high	low
User experience	rigid (potential source of frustration)	flexible
Recommendation	prior lower-fidelity evaluation strategy.	suitable for low-maturity evaluation.

17 of 17

Conclusions

WoZ as simulated ‘system’ interaction is lower cost and suitable for early stages.
WoZ lends itself to more anthropomorphic, linguistic interaction.
Less suitable for AI supported artefact-mediated interaction.
Artefact-mediated interaction is ubiquitous in C2.
Need to be careful of bias towards anthropomorphic interaction model.

17