Lilin Xu*, Kaiyuan Hou*, Xiaofan Jiang
Exploring the Capabilities of LLMs for IMU-based Fine-grained Human Activity Understanding
FMSys 2025
*Co-first author
2
LLMs in Sensor-based Applications
LLM
Healthcare
Assistive Systems
Monitoring
What day did I spend longest eating last week?
You spent the most time excising last week.
Where are my keys?
According to the cameras and detection results, your key is in the bedroom.
My heart feels pain when I take a deep breath after COVID-19…
Have you noticed any other symptoms such as cough, or headache?
3
IMU-based Human Activity Recognition
Wearable Device with IMU Sensor
Accelerometer
Gyroscope
Portability and Ubiquity
Low Power Consumption
Privacy Protection
4
IMU-based Human Activity Recognition
Wearable Device with IMU Sensor
Accelerometer
Gyroscope
Portability and Ubiquity
Low Power Consumption
Privacy Protection
How to take advantage of LLMs in IMU-based HAR?
Contextual understanding
5
Current Solutions
Text
Textual descriptions of activities
Pretrained LLM
6
Current Solutions
Text
Textual descriptions of activities
Semantic Gap
IMU Encoder
C moves the pen on the table
Text Encoder
Alignment
Pretrained LLM
Pretrained LLM
7
Current Solutions
Coarse-grained Activity
Fine-grained Activity
8
Current Solutions
Coarse-grained Activity
Fine-grained Activity
Pretrained LLM
Post-training Process
Pre-training Process
9
Preliminary
Data Collection Setup and Process
Data Visualization
10
Preliminary - Microbenchmark
Zero-shot: prompt LLMs using domain knowledge and Chain-of-Thought (CoT) reasoning
11
Preliminary - Microbenchmark
Zero-shot: prompt LLMs using domain knowledge and Chain-of-Thought (CoT) reasoning
12
Preliminary - Microbenchmark
Zero-shot: prompt LLMs using domain knowledge and Chain-of-Thought (CoT) reasoning
13
Preliminary - Microbenchmark
Few-shot: include ‘label-data’ pairs as examples in the prompt
14
Preliminary - Microbenchmark
Few-shot: include ‘label-data’ pairs as examples in the prompt
Same as the zero-shot prompt
15
Preliminary - Microbenchmark
Few-shot: include ‘label-data’ pairs as examples in the prompt
16
Preliminary - Microbenchmark
[‘(2D)|(3D)’]
17
Preliminary - Microbenchmark
[‘(2D)|(3D)’]
Perform poorly with accuracies falling below random guessing
Zero-shot
18
Preliminary - Microbenchmark
[‘(2D)|(3D)’]
Few-shot (2D Case)
19
Preliminary - Microbenchmark
[‘(2D)|(3D)’]
Few-shot (2D Case)
20
Preliminary - Microbenchmark
[‘(2D)|(3D)’]
Few-shot (3D Case)
21
Preliminary - Microbenchmark
[‘(2D)|(3D)’]
Few-shot (3D Case)
Pretrained LLMs cannot directly handle fine-grained HAR tasks
Expert Knowledge
Pretrained LLMs
22
Experiment - Single Letter Prediction
23
Experiment - Single Letter Prediction
Step 1
24
Experiment - Single Letter Prediction
Reconstruct the reasoning answer
Step 1
Step 2
25
Experiment - Single Letter Prediction
Reconstruct the reasoning answer
Convert the phrasing style
Step 1
Step 3
Step 2
26
Experiment - Single Letter Prediction
Reconstruct the reasoning answer
Convert the phrasing style
Step 1
Step 3
Step 2
27
Experiment - Single Letter Prediction
IMU data
28
Experiment - Single Letter Prediction
IMU data
Reasoning
29
Experiment - Single Letter Prediction
IMU data
Classification Result
30
Experiment - Single Letter Prediction
Recognition accuracy (2D)
Recognition accuracy (3D)
0
0.38%
0
0
31
Experiment - Single Letter Prediction
Recognition accuracy (2D)
Recognition accuracy (3D)
Fine-tuning improves performance across both models
0
0
0
0.38%
32
Experiment - Single Letter Prediction
Recognition accuracy (2D)
Recognition accuracy (3D)
Fine-tuning improves performance across both models
0
0.38
0
0
Before Fine-tuning
33
Experiment - Single Letter Prediction
Recognition accuracy (2D)
Recognition accuracy (3D)
Fine-tuning improves performance across both models
0
0.38
0
0
Before Fine-tuning
34
Experiment - Single Letter Prediction
Recognition accuracy (2D)
Recognition accuracy (3D)
Few-shot learning substantially improves accuracy
0
0
0
Fine-tuning improves performance across both models
0.38%
35
Experiment - Single Letter Prediction
Recognition accuracy (2D)
Recognition accuracy (3D)
Performance on 3D data remains poor
0
0
0
Few-shot learning substantially improves accuracy
Fine-tuning improves performance across both models
0.38%
36
Experiment - Mid-Air Contextual Letter Series
Mid-air gesture instead of on-flatten gesture
Contextual letter series instead of single letters
37
Experiment - Mid-Air Contextual Letter Series
Mapping Stage
Classification Stage
Maps 3D IMU data to 2D representations that can be interpreted by LLMs
Uses the fine-tuned LLMs to recognize contextual letter series instead of single letter
38
Framework of Similarity Estimator
Mapping Accuracy: 93.08%
Experiment - Mid-Air Contextual Letter Series
39
Experiment - Mid-Air Contextual Letter Series
‘s’
‘d’
‘v’
‘e’
‘save’
Single Letter
Contextual Letter Series
40
Experiment - Mid-Air Contextual Letter Series
‘s’
‘d’
‘v’
‘e’
‘save’
Single Letter
Contextual Letter Series
1500 common English nouns
Draw the letter for 𝑘 times, where 𝑘 ∈ [2, 5]
Word length: 3 - 6
41
Recognition Example
Experiment - Mid-Air Contextual Letter Series
42
Recognition Example
Experiment - Mid-Air Contextual Letter Series
Accuracy on Contextual Letter Series
43
Conclusion
Fine-tuning help LLMs to understand the specific task
Examples help LLMs to increase the performance
The contextual capabilities make LLMs promising in practical applications
Thank you!
lx2331@columbia.edu
Lilin Xu
Columbia University