1 of 14

1

Kaiwei Liu1, Bufang Yang1, Lilin Xu1, Yunqi Guo1, Guoliang Xing 1,

Xian Shuai2, Xiaozhe Ren2, Xin Jiang2, Zhenyu Yan1

The Chinese University of Hong Kong1,

Noah’s Ark Lab, Huawei2

TaskSense: A Translation-like Approach for Tasking Heterogeneous Sensor Systems with LLMs

2 of 14

Smart Sensor Systems are Everywhere…

Smart home as an example

  • ~15 connected devices per person
  • Doubled in next 10 years

Ambient Monitoring

Surveillance

Smoke Alarm

1

3 of 14

…But Are They Really Smart?

  • Manual efforts
  • Fixed tasks

But in real world…

  • Queries are diverse
  • Rule-based combination across sensor systems is +∞

How is the baby?

Where are my keys?

Turn on the light.

Rule-based Control:

APIs for:

  • Collecting data from 1000+ brands

  • Plenty of functions/algorithms

Yes

No

e.g.

by

2

4 of 14

Promising Power of LLMs

Understand User Query

Summarizing

Reasoning

The room is too dark that I cannot find my keys.

  • Turn on the light
  • Help find the keys

[1] ControlLLM: Augment Language Models with Tools by Searching on Graphs. arXiv’23

Here are the takeaways summarized from …

Can we utilize LLMs to coordinate sensor systems?

3

5 of 14

Existing LLM-based systems?

  • Tool capabilities
  • Cannot handle complex dependencies

Action Recognition

Face Recognition

How is my blood pressure?

Did my son play chess?

Tool Set

Video Retrieval

Face Recognition

Video Retrieval

Action Recognition

Who is speaking in the kitchen?

Audio Retrieval

Speaker Recognition

Audio Retrieval

Speaker Separation

Speaker Recognition

LLM-based coordination systems

  • HuggingGPT [NeurIPS’ 23]: Utilizing LLMs to connect various AI models solve AI tasks
  • Sasha [IMWUT’24]: Using LLMs to control devices for user commands

“Sitting”

“Standing”

“Walking”

4

6 of 14

What About Systems Factors?

Can we design a system powered by a LLMs that accurately understands

and correctly coordinates multiple sensor systems?

  • Data Missing
  • Data Noise
  • Other Issues

Object

Detection

Day 1

Day 4

Object

Detection

Video

Retrieval

Light Condition

Sensor

Inherent Noise

Occlusion

Not in Home

Day 2

Day 3

No Outputs

Wrong Outputs

No/Wrong Outputs

5

7 of 14

Our Key Idea: Sensor Language System

Utilize LLMs to dynamically coordinates sensor systems through Sensor Language definition and run-time plan adaptation.

Sensor Language

Vocabulary Set

Grammar Rules

Retrieval

Checking

Generate

Plan

Dynamic Plan Adaptation

Find Alternative Paths

Execute

Pre-check

Post-check

Results Formatting

Formatted Results

Response

Example Library

How is the baby?

User Query

The baby is crying.

6

8 of 14

Vocabulary Set and Solvability Check

Vocabulary Set

Learning tool capabilities from structured tool documents

E.g., human activity recognition:

  • Hardware Device
  • Function Description
  • Input data
  • Output data
  • Label Set

When did Bob lie on the bed last night?

Open-ended?

Have required tools?

Have necessary labels?

vocabulary

set

Solvability Check

Solvable

Unsolvable

Unsolvable

7

9 of 14

Grammar Rules and Grammar Check

Grammar Check

Tool calling plan

graph of plan

graph of grammar rules

matching

Is Subgraph!

PASS!

Grammar Rules

Defining data dependencies among the tools

Video Retrieval

Face Detection

Expression Recognition

Human

Detection

Grammar Rules Construction

vocabulary

set

Grammar Rules

Video Retrieval

Human

Detection

Grammar Check

Is Subgraph,

PASS!

Grammar Rules Construction

vocabulary

set

Grammar

Rules

User

8

10 of 14

Runtime Plan Adaptation

Switching between alternative paths to resist interference from various environmental factors.

Response Generation

Human Detection

Associated and

formatted the results

Response

RGB Video

Retrieval

(Living Room)

RGB Video

Retrieval

(Bedroom)

Depth Video

Retrieval

(Bedroom)

RGB HAR

RGB HAR

Depth HAR

Original Path

Alternative Path 1

Alternative Path 2

well-lit

person in living room

well-lit

person in bedroom

poorly-lit

person in bedroom

9

11 of 14

An End-to-end Showcase

Deployed in a bedroom

RGB & Depth Camera

(Vzense DCAM560C)

Microphone

Thermal

Camera

Sensors Used

How long have Bob been working?

Start

Query

RGB Video

Human Detection

RGB HAR

Query

Thermal Video

Sanitary Behavior Monitoring

Query

Depth Video

Human Detection

Depth HAR

Alternative Path

1

2

2

1

It appears that Bob was indeed working for a significant amount of time on the afternoon.

10

12 of 14

Evaluation

Overall performance of different methods (using GPT-4 as the base LLM).

  • Planning accuracy: 64%
  • Execution accuracy: 29%↑
  • Response accuracy: 30%↑

Assess query solvability more accurately

Identifying wrong dependency

Adaptive to variable environment

11

13 of 14

Evaluation

Balance latency and computation overhead by cached tool numbers.

12

14 of 14

Thanks!

  • TaskSense: A Translation-like Approach for Tasking Heterogeneous Sensor Systems with LLMs
  • Kaiwei Liu, Bufang Yang, Lilin Xu, Yunqi Guo, Guoliang Xing, Xian Shuai, Xiaozhe Ren, Xin Jiang, Zhenyu Yan

Visit CUHK AIoT Lab