1
Kaiwei Liu1, Bufang Yang1, Lilin Xu1, Yunqi Guo1, Guoliang Xing 1,
Xian Shuai2, Xiaozhe Ren2, Xin Jiang2, Zhenyu Yan1
The Chinese University of Hong Kong1,
Noah’s Ark Lab, Huawei2
TaskSense: A Translation-like Approach for Tasking Heterogeneous Sensor Systems with LLMs
Smart Sensor Systems are Everywhere…
Smart home as an example
Ambient Monitoring
Surveillance
Smoke Alarm
1
…But Are They Really Smart?
But in real world…
How is the baby?
Where are my keys?
Turn on the light.
…
…
…
…
…
…
…
Rule-based Control:
APIs for:
Yes
No
e.g.
by
…
…
2
Promising Power of LLMs
Understand User Query
Summarizing
Reasoning
The room is too dark that I cannot find my keys.
[1] ControlLLM: Augment Language Models with Tools by Searching on Graphs. arXiv’23
Here are the takeaways summarized from …
Can we utilize LLMs to coordinate sensor systems?
3
Existing LLM-based systems?
Action Recognition
Face Recognition
How is my blood pressure?
Did my son play chess?
Tool Set
Video Retrieval
Face Recognition
Video Retrieval
Action Recognition
Who is speaking in the kitchen?
Audio Retrieval
Speaker Recognition
Audio Retrieval
Speaker Separation
Speaker Recognition
LLM-based coordination systems
“Sitting”
“Standing”
“Walking”
4
What About Systems Factors?
Can we design a system powered by a LLMs that accurately understands
and correctly coordinates multiple sensor systems?
Object
Detection
Day 1
Day 4
Object
Detection
Video
Retrieval
Light Condition
Sensor
Inherent Noise
Occlusion
Not in Home
Day 2
Day 3
No Outputs
Wrong Outputs
No/Wrong Outputs
5
Our Key Idea: Sensor Language System
Utilize LLMs to dynamically coordinates sensor systems through Sensor Language definition and run-time plan adaptation.
Sensor Language
Vocabulary Set
Grammar Rules
Retrieval
Checking
Generate
Plan
Dynamic Plan Adaptation
Find Alternative Paths
Execute
Pre-check
Post-check
Results Formatting
Formatted Results
Response
Example Library
How is the baby?
User Query
The baby is crying.
6
Vocabulary Set and Solvability Check
Vocabulary Set
Learning tool capabilities from structured tool documents
E.g., human activity recognition:
When did Bob lie on the bed last night?
Open-ended?
Have required tools?
Have necessary labels?
vocabulary
set
Solvability Check
Solvable
Unsolvable
Unsolvable
7
Grammar Rules and Grammar Check
Grammar Check
Tool calling plan
graph of plan
graph of grammar rules
matching
Is Subgraph!
PASS!
Grammar Rules
Defining data dependencies among the tools
Video Retrieval
Face Detection
Expression Recognition
Human
Detection
…
…
Grammar Rules Construction
vocabulary
set
…
Grammar Rules
Video Retrieval
Human
Detection
Grammar Check
Is Subgraph,
PASS!
Grammar Rules Construction
vocabulary
set
Grammar
Rules
User
8
Runtime Plan Adaptation
Switching between alternative paths to resist interference from various environmental factors.
Response Generation
Human Detection
…
Associated and
formatted the results
Response
…
…
…
…
…
RGB Video
Retrieval
(Living Room)
…
RGB Video
Retrieval
(Bedroom)
Depth Video
Retrieval
(Bedroom)
RGB HAR
RGB HAR
Depth HAR
Original Path
Alternative Path 1
Alternative Path 2
well-lit
person in living room
well-lit
person in bedroom
poorly-lit
person in bedroom
9
An End-to-end Showcase
Deployed in a bedroom
RGB & Depth Camera
(Vzense DCAM560C)
Microphone
Thermal
Camera
Sensors Used
How long have Bob been working?
Start
Query
RGB Video
Human Detection
RGB HAR
Query
Thermal Video
Sanitary Behavior Monitoring
Query
Depth Video
Human Detection
Depth HAR
Alternative Path
1
2
2
1
It appears that Bob was indeed working for a significant amount of time on the afternoon.
10
Evaluation
Overall performance of different methods (using GPT-4 as the base LLM).
Assess query solvability more accurately
Identifying wrong dependency
Adaptive to variable environment
✅
✅
✅
11
Evaluation
Balance latency and computation overhead by cached tool numbers.
12
Thanks!
Visit CUHK AIoT Lab