1 of 42

Addressing and Visualizing Misalignments �in Human Task-Solving Trajectories

Sejin Kim

GIST AI Graduate School

2024. 12. 17

2 of 42

Outline

- Motivation

- Previous Works

1. O2ARC

2. ARCLE

- Misalignments in O2ARC Trajectories

- Trajectory Visualization

- Future Plan

2

3 of 42

Motivation

3

4 of 42

ARC (Abstraction and Reasoning Corpus)

4

5 of 42

Abstraction & Reasoning in Solving ARC Tasks

5

Abstraction

Reasoning

O2ARC Task-342

6 of 42

Research Goal

Developing a Human-like RL Agent through Conceptual Abstraction and Reasoning

6

7 of 42

Research Goal

Developing a Human-like RL Agent through Conceptual Abstraction and Reasoning

  • Analysis of LLMs’ Capabilities on Abstraction and Reasoning ⇦ Presented by Woochang Sim on Nov. 7
    • LLMs abstraction and reasoning capability grounded in LoTH

7

8 of 42

Research Goal

Developing a Human-like RL Agent through Conceptual Abstraction and Reasoning

  • Analysis of LLMs’ Capabilities on Abstraction and Reasoning
    • LLMs abstraction and reasoning capability grounded in LoTH
  • Learning Conceptual Abstraction based on User Intention ⇦ Today’s topic
    • Analysis of the differences between O2ARC Trajectories and user intentions
    • Importance of intention-level learning of task-solving trajectories

8

9 of 42

Research Goal

Developing a Human-like RL Agent through Conceptual Abstraction and Reasoning

  • Analysis of LLMs’ Capabilities on Abstraction and Reasoning
    • LLMs abstraction and reasoning capability grounded in LoTH
  • Learning Conceptual Abstraction based on User Intention
    • Analysis of the differences between O2ARC Trajectories and user intentions
    • Importance of intention-level learning of task-solving trajectories
  • Developing an RL Agent Capable of Human-like System-2 Reasoning ⇦ Future plan

9

10 of 42

Previous Works

10

11 of 42

Object-Oriented ARC (O2ARC 3.0)

1

11

- Tool for solving ARC tasks for human

- Around 13k trajectories are collected

IJCAI 2024 Demo

12 of 42

Motivation

Lack of ARC Task-Solving Trajectory Data

  • ARC tasks consist of just a few number of input/output grid pairs
  • Data on the reasoning process to infer the output grid from the input grid is required
  • Collecting sufficient solution data from humans is too costly
  • Current AI models struggle to augment human-level solution data

12

13 of 42

Goal

Encouraging Voluntary User Participation

  • Gamified interface and competitive components (i.e., score, leaderboard)
  • 13k trajectories are collected approximately

13

14 of 42

Trajectories Collected by O2ARC

Trajectories Containing Rich Information

  • Grid-related information (grid size, color of each pixel)
  • Object-related information (location, pixels of the object, pixels of the background)
  • Action-related information (type of action, target pixels of the action)

14

15 of 42

Trajectories Collected by O2ARC

Trajectories Containing Efficient Processes

  • Collects object-centric trajectories by providing object-based actions
  • Encourages competition to gather data consisting of a minimal number of actions

15

16 of 42

ARC Learning Environment (ARCLE)

1

16

- RL Env for AI to solve ARC tasks

- PPO handle single task

CoLLAs 2024

17 of 42

Motivation

Lack of an RL Environment for Training O2ARC Trajectories

  • RL agents learn through interaction with an RL environment
  • The ARC benchmark has not been explored using RL
  • An environment is needed to train O2ARC trajectories

17

18 of 42

Goal

Providing an RL Environment for Learning ARC Task-Solving Processes

  • The MDP formulation of O2ARC Trajectories
  • Normalized the object-based actions provided by O2ARC 3.0

18

19 of 42

Training a PPO Agent through ARCLE

Performance of Agents with Various Policies

  • The agent influenced by previous actions outperformed that not influenced
  • The agent is learning the process of solving the given ARC tasks
  • The agent that emphasized only the meaning of color changes showed a less improvement

19

20 of 42

Misalignments in O2ARC Trajectories

20

21 of 42

Motivation

The PPO Agent Trained through ARCLE is Not Learning Well for the Single ARC Task

  • We hypothesize this discrepancy stems from intention-trajectory misalignment
  • O2ARC actions consist of both pixel-level actions and object-level actions
  • Some user intentions are not correspond one-to-one with actions in the trajectories

21

22 of 42

Goal

Identify Misalignments with Activity Theory

22

Users

Tasks

Tools

23 of 42

Goal

Categorize Misalignments within 3 cases

23

24 of 42

Goal

Quantifying Misalignments in various ways

24

25 of 42

Activity Theory

Human Activities through Interactions between Users, Tasks, and Tools

  • User (Subject) refers to the individual or group performing the activity to achieve a goal
  • Task (Object) is the goal of the activity, or the overall activity itself
  • Tool (Instrument) is the tool or means used by the subject to effectively achieve the object

25

Users

Tasks

Tools

26 of 42

Misalignments via Activity Theory

Misalignments between Trajectories and Intentions Explained by Activity Theory

  • Misalignments between Tools and Tasks (Functional Inadequacies in Tools) occur when tools lack functionality to support the tasks

26

Users

Tasks

Tools

27 of 42

Misalignments via Activity Theory

Misalignments between Trajectories and Intentions Explained by Activity Theory

  • Misalignments between Users and Tools (User Unfamiliarity with Tools) result from limited tool proficiency

27

Users

Tasks

Tools

28 of 42

Misalignments via Activity Theory

Misalignments between Trajectories and Intentions Explained by Activity Theory

  • Misalignments between Users and Tasks (Cognitive Dissonance in Users) stem from misunderstandings of task

28

Users

Tasks

Tools

29 of 42

Example Case: O2ARC Task-49

30 of 42

Ideal Trajectories

O2ARC Task-49

31 of 42

Functional Inadequacies in Tools

O2ARC Task-49

32 of 42

User Unfamiliarity with Tools

O2ARC Task-49

33 of 42

Cognitive Dissonance in Users

O2ARC Task-49

34 of 42

Trajectories Visualization

A node represents a state

  • Radii: in-degree
  • Color: remark

An edge represents an action

  • Thickness: frequency
  • Color: type

O2ARC Task-49

Task ID: 23b5c85d

35 of 42

Misalignment Analysis

35

36 of 42

Action-Level Analysis

Tasks with high and low average in-degrees and exhibit different patterns

  • High: Users accordingly apply object-level actions, aligning with their intentions
  • Low: Users rely on irregular pixel-level actions, occurring misalignments commonly

Object-level Actions

Pixel-level Actions

37 of 42

Intention-Level Analysis

Some actions should be added to O2ARC reflecting users’ common intention

  • Aligned intentions (91.11%) show O2ARC trajectories as reliable for strategy analysis
  • Functional Inadequacy has the highest actions-to-intentions ratio (6.64)
  • It highlights the need to improve O2ARC’s action set

38 of 42

Trajectory-Level Analysis

Trajectory analysis highlights user strategies, tool limits, and misalignment causes

  • 35% of trajectories show structured strategies, especially in simple tasks, aiding AI training
  • Over 30% of misalignments stem from tool limits in repetitive tasks like pixel edits
  • User Unfamiliarity and Cognitive Dissonance is not overlapped relatively (only 3.45%)

39 of 42

Task-Level Analysis

KDE plot represents the distribution of misalignments across tasks

  • User Unfamiliarity occurs frequently at the early stage with low intention proportions
  • Functional Inadequacy occurs consistently throughout
  • O2ARC requires improvements in its action set and better explanations for beginners

40 of 42

Future Plans

40

41 of 42

Future Plans

  1. General human intention detection via O2ARC Tasks
  2. Reward modeling based on O2ARC trajectories
  3. Program synthesis with intention-aligned trajectories

41

42 of 42

Q&A session

42