1 of 44

Lilin Xu^*, Kaiyuan Hou^*, Xiaofan Jiang

Exploring the Capabilities of LLMs for IMU-based Fine-grained Human Activity Understanding

FMSys 2025

^{*Co-first author}

2 of 44

LLMs in Sensor-based Applications

LLMs shows exceptional generalization and reasoning capabilities

LLM

Healthcare

Assistive Systems

Monitoring

What day did I spend longest eating last week?

You spent the most time excising last week.

Where are my keys?

According to the cameras and detection results, your key is in the bedroom.

My heart feels pain when I take a deep breath after COVID-19…

Have you noticed any other symptoms such as cough, or headache?

3 of 44

IMU-based Human Activity Recognition

IMU sensors are widely available and cost-effective, making them ideal for human activity recognition

Wearable Device with IMU Sensor

Accelerometer

Gyroscope

Portability and Ubiquity

Low Power Consumption

Privacy Protection

4 of 44

IMU-based Human Activity Recognition

IMU sensors are widely available and cost-effective, making them ideal for human activity recognition

Wearable Device with IMU Sensor

Accelerometer

Gyroscope

Portability and Ubiquity

Low Power Consumption

Privacy Protection

How to take advantage of LLMs in IMU-based HAR?

Contextual understanding

5 of 44

Current Solutions

Knowledge-Driven HAR with Pre-Trained LLMs

[FMSys’24] HARGPT
[SenSys-ML’24] LLMSense

Text

Textual descriptions of activities

Pretrained LLM

6 of 44

Current Solutions

Knowledge-Driven HAR with Pre-Trained LLMs

Modality Alignment between IMU and Text

[FMSys’24] HARGPT
[SenSys-ML’24] LLMSense

Text

Textual descriptions of activities

Semantic Gap

IMU Encoder

C moves the pen on the table

Text Encoder

Alignment

LLaSA
SensorLLM

Pretrained LLM

7 of 44

Current Solutions

Focus on coarse-grained activities (e.g., walking, sit-stand, jogging)

Coarse-grained Activity

Fine-grained Activity

8 of 44

Treat IMU data as text to use pretrained LLMs (post-training process)

Current Solutions

Focus on coarse-grained activities (e.g., walking, sit-stand, jogging)

Coarse-grained Activity

Fine-grained Activity

Pretrained LLM

Post-training Process

Pre-training Process

9 of 44

Preliminary

Data collection of handwritten letters

26 letters (from ‘A’ to ‘Z’), two cases (on-flatten [2D] & Mid-air [3D]), 10 repetitions
Data from one participant is used as the training set, while data from the other participant serves as the test set

Data Collection Setup and Process

Data Visualization

10 of 44

Preliminary - Microbenchmark

The recognition performance of LLMs with in-context learning

In-context Learning: zero-shot and few-shot settings for prompts
Compare with traditional small classification models

Zero-shot: prompt LLMs using domain knowledge and Chain-of-Thought (CoT) reasoning

11 of 44

Preliminary - Microbenchmark

The recognition performance of pretrained LLMs

In-context Learning: zero-shot and few-shot settings for prompts
Compare with traditional small classification models

Zero-shot: prompt LLMs using domain knowledge and Chain-of-Thought (CoT) reasoning

12 of 44

Preliminary - Microbenchmark

The recognition performance of pretrained LLMs

In-context Learning: zero-shot and few-shot settings for prompts
Compare with traditional small classification models

Zero-shot: prompt LLMs using domain knowledge and Chain-of-Thought (CoT) reasoning

13 of 44

Preliminary - Microbenchmark

The recognition performance of pretrained LLMs

In-context Learning: zero-shot and few-shot settings for prompts
Compare with traditional small classification models

Few-shot: include ‘label-data’ pairs as examples in the prompt

14 of 44

Preliminary - Microbenchmark

The recognition performance of pretrained LLMs

In-context Learning: zero-shot and few-shot settings for prompts
Compare with traditional small classification models

Few-shot: include ‘label-data’ pairs as examples in the prompt

Same as the zero-shot prompt

15 of 44

Preliminary - Microbenchmark

The recognition performance of pretrained LLMs

In-context Learning: zero-shot and few-shot settings for prompts
Compare with traditional small classification models

Few-shot: include ‘label-data’ pairs as examples in the prompt

16 of 44

Preliminary - Microbenchmark

The recognition performance of pretrained LLMs

In-context Learning: zero-shot and few-shot settings for prompts
Compare with traditional small classification models

[‘(2D)|(3D)’]

17 of 44

Preliminary - Microbenchmark

The recognition performance of pretrained LLMs

In-context Learning: zero-shot and few-shot settings for prompts
Compare with traditional small classification models

[‘(2D)|(3D)’]

Perform poorly with accuracies falling below random guessing

Zero-shot

18 of 44

Preliminary - Microbenchmark

The recognition performance of pretrained LLMs

In-context Learning: zero-shot and few-shot settings for prompts
Compare with traditional small classification models

[‘(2D)|(3D)’]

GPT-4o and DeepSeek-R1 can benefit from provided examples

Few-shot (2D Case)

19 of 44

Preliminary - Microbenchmark

The recognition performance of pretrained LLMs

In-context Learning: zero-shot and few-shot settings for prompts
Compare with traditional small classification models

[‘(2D)|(3D)’]

GPT-4o and DeepSeek-R1 can benefit from provided examples

Small LLM (LLaMA-3-8B) fail to interpret this time-series classification task

Few-shot (2D Case)

20 of 44

Preliminary - Microbenchmark

The recognition performance of pretrained LLMs

In-context Learning: zero-shot and few-shot settings for prompts
Compare with traditional small classification models

[‘(2D)|(3D)’]

All LLMs perform poorly on 3D letter recognition since mid-air gestures present additional complexities

Few-shot (3D Case)

21 of 44

Preliminary - Microbenchmark

The recognition performance of pretrained LLMs

In-context Learning: zero-shot and few-shot settings for prompts
Compare with traditional small classification models

[‘(2D)|(3D)’]

All LLMs performed poorly on 3D letter recognition since mid-air gestures present additional complexities

Few-shot (3D Case)

Pretrained LLMs cannot directly handle fine-grained HAR tasks

Expert Knowledge

Pretrained LLMs

22 of 44

Experiment - Single Letter Prediction

We first generate an instruction-response dataset to fine-tune LLMs

Step 1: Generate Correct Reasoning
Step 2: Rephrase for “Discovery” Mode
Step 3: Prompt Versatility

23 of 44

Experiment - Single Letter Prediction

We first generate an instruction-response dataset to fine-tune LLMs

Step 1: Generate Correct Reasoning
Step 2: Rephrase for “Discovery” Mode
Step 3: Prompt Versatility

Step 1

24 of 44

Experiment - Single Letter Prediction

We first generate an instruction-response dataset to fine-tune LLMs

Step 1: Generate Correct Reasoning
Step 2: Rephrase for “Discovery” Mode
Step 3: Prompt Versatility

Reconstruct the reasoning answer

Step 1

Step 2

25 of 44

Experiment - Single Letter Prediction

We first generate an instruction-response dataset to fine-tune LLMs

Step 1: Generate Correct Reasoning
Step 2: Rephrase for “Discovery” Mode
Step 3: Prompt Versatility

Reconstruct the reasoning answer

Convert the phrasing style

Step 1

Step 3

Step 2

26 of 44

Experiment - Single Letter Prediction

We first generate an instruction-response dataset to fine-tune LLMs

Step 1: Generate Correct Reasoning
Step 2: Rephrase for “Discovery” Mode
Step 3: Prompt Versatility

Reconstruct the reasoning answer

Convert the phrasing style

Step 1

Step 3

Step 2

27 of 44

Experiment - Single Letter Prediction

We first generate an instruction-response dataset to fine-tune LLMs

1560 instruction-response pair

IMU data

28 of 44

Experiment - Single Letter Prediction

We first generate an instruction-response dataset to fine-tune LLMs

1560 instruction-response pair

IMU data

Reasoning

29 of 44

Experiment - Single Letter Prediction

We first generate an instruction-response dataset to fine-tune LLMs

1560 instruction-response pair

IMU data

Classification Result

30 of 44

Experiment - Single Letter Prediction

We fine-tune LLaMA-3-8B and GPT-4o with LoRA

Recognition accuracy (2D)

Recognition accuracy (3D)

0.38%

31 of 44

Experiment - Single Letter Prediction

We fine-tune LLaMA-3-8B and GPT-4o with LoRA

Recognition accuracy (2D)

Recognition accuracy (3D)

Fine-tuning improves performance across both models

0.38%

32 of 44

Experiment - Single Letter Prediction

We fine-tune LLaMA-3-8B and GPT-4o with LoRA

Recognition accuracy (2D)

Recognition accuracy (3D)

Fine-tuning improves performance across both models

0.38

Before Fine-tuning

33 of 44

Experiment - Single Letter Prediction

We fine-tune LLaMA-3-8B and GPT-4o with LoRA

Recognition accuracy (2D)

Recognition accuracy (3D)

Fine-tuning improves performance across both models

0.38

Before Fine-tuning

34 of 44

Experiment - Single Letter Prediction

We fine-tune LLaMA-3-8B and GPT-4o with LoRA

Recognition accuracy (2D)

Recognition accuracy (3D)

Few-shot learning substantially improves accuracy

Fine-tuning improves performance across both models

0.38%

35 of 44

Experiment - Single Letter Prediction

We fine-tune LLaMA-3-8B and GPT-4o with LoRA

Recognition accuracy (2D)

Recognition accuracy (3D)

Performance on 3D data remains poor

Few-shot learning substantially improves accuracy

Fine-tuning improves performance across both models

0.38%

36 of 44

Experiment - Mid-Air Contextual Letter Series

An end-to-end mid-air gesture understanding pipeline based on LLMs

Mid-air gesture instead of on-flatten gesture

Contextual letter series instead of single letters

37 of 44

Experiment - Mid-Air Contextual Letter Series

An end-to-end mid-air gesture understanding pipeline based on LLMs

Includes Mapping stage and classification stage

Mapping Stage

Classification Stage

Maps 3D IMU data to 2D representations that can be interpreted by LLMs

Uses the fine-tuned LLMs to recognize contextual letter series instead of single letter

38 of 44

An end-to-end mid-air gesture understanding pipeline based on LLMs

Mapping stage: Mapping mid-air gestures to flat-surface gestures through deep metric learning

Framework of Similarity Estimator

Mapping Accuracy: 93.08%

Experiment - Mid-Air Contextual Letter Series

39 of 44

An end-to-end mid-air gesture understanding pipeline based on LLMs

Classification stage: identify contextual letter series (word) instead of single letter
Experiment with fine-tuned LLaMA-3-8B

Experiment - Mid-Air Contextual Letter Series

‘s’

‘d’

‘v’

‘e’

‘save’

Single Letter

Contextual Letter Series

40 of 44

An end-to-end mid-air gesture understanding pipeline based on LLMs

Classification stage: identify contextual letter series (word) instead of single letter
Experiment with fine-tuned LLaMA-3-8B

Experiment - Mid-Air Contextual Letter Series

‘s’

‘d’

‘v’

‘e’

‘save’

Single Letter

Contextual Letter Series

1500 common English nouns

Draw the letter for 𝑘 times, where 𝑘 ∈ [2, 5]

Word length: 3 - 6

41 of 44

An end-to-end mid-air gesture understanding pipeline based on LLMs

Classification stage: identify contextual letter series (word) instead of single letter
Experiment with fine-tuned LLaMA-3-8B

1500 common English nouns
Draw the letter for 𝑘 times, where 𝑘 ∈ [2, 5]

Recognition Example

Experiment - Mid-Air Contextual Letter Series

42 of 44

An end-to-end mid-air gesture understanding pipeline based on LLMs

Classification stage: identify contextual letter series (word) instead of single letter
Experiment with fine-tuned LLaMA-3-8B

1500 common English nouns
Draw the letter for 𝑘 times, where 𝑘 ∈ [2, 5]

Recognition Example

Experiment - Mid-Air Contextual Letter Series

Accuracy on Contextual Letter Series

43 of 44

Conclusion

We explore the capabilities of LLMs for IMU-based fine-grained HAR based on handwritten letter recognition

Measure the LLM’s capabilities in fine-grained HAR with different settings

Provide insights for LLMs in understanding IMU data

Propose an end-to-end pipeline for mid-air gesture understanding

Fine-tuning help LLMs to understand the specific task

Examples help LLMs to increase the performance

The contextual capabilities make LLMs promising in practical applications

44 of 44

Thank you!

lx2331@columbia.edu

Lilin Xu

Columbia University