1 of 18

Human-Swarm Control Using Vision, Speech & Multi-Agent LLMs

An HRI Project Presented by Moses Ebere & Joseph Adeola

3 of 18

Can Large Language Models be used

to enhance performance in today’s robotic systems? Specifically in the Human-Swarm Interaction domain.

Motivation

5 of 18

HRI

Pipeline

6 of 18

HRI

Vision System

7 of 18

Chatbox

AutoGen

8 of 18

Control Stack

Reynold’s Arrival Behaviour with Omnidirectional Points in RViz

Reynold’s Arrival Behaviour with Differential Drive Robots in Gazebo

9 of 18

Results

Gesture Formation

Letter Formation

10 of 18

Results

Text-based Control Command and Data Extraction

11 of 18

Results

Text-based Control Command and Data Extraction

12 of 18

Participants

13 of 18

Greta

French & Russian Lang.

Matilda

Elec. Engineering

Matko

Economics

Adrien

Computer Science

Disclaimer: Images do not depict real people.

Some Participants

14 of 18

Experiment

Images from User Case Studies

15 of 18

Our Goals vs Observation

Preference Test.

Pre-Experiment Goals

Recognition Check: Ability to identify formations.

Comfort & Expectations: User reactions and desired features.

AutoGen Interaction: Effect on understanding agent behavior.

16 of 18

Description

Word Cloud

WordCloud showcasing the most frequently mentioned expectations or functionalities.

17 of 18

Comments From Participants

HRI

Focus on just one component at a time

Matilda

I dont think the Human should be replaced.

Adrien

Focus on just one component at a time

Matko

I prefer the text control method because

it allows you to properly check the

commands you’re sending to the robots. Also, some people with speech impediments can easily make use of it.

The human in the loop should not be removed too.

Greta

I like the fact that we can get information about the system from the language agents. It makes task management easy but also introduces a loophole due to a lack of control of them.

Adrien

Overall, the gesture response is the fastest. The latency in the speech command can also be improved. Overall,

I think focusing on one control method

rather than several methods is more

intuitive.

Matilda

It's an intuitive idea. I prefer the text command mode.

Matko

1 of 18

2 of 18

3 of 18

4 of 18

5 of 18

6 of 18

7 of 18

8 of 18

9 of 18

10 of 18

11 of 18

12 of 18

13 of 18

14 of 18

15 of 18

16 of 18

17 of 18

18 of 18