Virtual Data Generation for Complex Industrial Activity Recognition
12, Dec 2024
2 of 11
Outline
Background of the Challenge
Dataset Overview
Challenge Overview
Sample Notebook Walkthrough
Sample Submission File
Evaluation Criteria
Questions and Answers
3 of 11
Background of the Challenge
Emergence of Virtual Data:
Virtual data generation is especially important in Factory Activity Recognition with wearable sensors, where real-world data is often limited and activities are complex. By generating more data, we can reduce data collection efforts with various activities in various scenarios.
Key Technologies Enabling Virtual Data Generation:
Data Augmentations
Generative Adversarial Networks (GANs)
Variational Autoencoders (VAEs)
Diffusion Model
Cross-domain generation (such as IMUTube, IIMUGPU…)
etc.
Challenges in Generating High-Quality Virtual Data:
Source data quality is poor
Data distribution that different from real data
4 of 11
Dataset Overview
In this challenge, we use acceleration data of Scenario 1.
5 of 11
Dataset Overview
In this challenge, we use acceleration data from subjects’ both wrists.
6 of 11
Challenge Overview
Key Objective: Develop virtual data generation methods to improve Human Activity Recognition (HAR) using the OpenPack dataset.
Dataset Features: Acceleration data of subjects’ both wrists in Scenario 1.
Evaluation Metric: F1 score calculated on unseen test data using trained HAR models.
Raw data
Virtual data
HAR model
Virtual data generation algorithm
The only part the participants need to do
7 of 11
Sample Notebook Walkthrough
Code Availability: Pre-configured Jupyter notebook on Google Colab for quick setup and execution.
Functionality Demonstrated:
Preparation
Use real data to generate virtual data
Use the generated data to improve HAR model performance
Submission code
Ease of Use: Intuitive notebook design allows participants to modify code and test their ideas.
Design your code here
Check the size of generated data
8 of 11
Sample Submission File
Submission Format: Participants must submit (1) a `.py` file containing virtual data generation functions that relate to “custom_virtual_data_generation” function and (2) the generated virtual data.
Required Details:
Keeping unchanged of the input and output of “custom_virtual_data_generation” function.
Save the virtual data in correct format (next slide).
File size of generated data located at “virtual” directory should be limited to 500MB.
Compatibility: Need be executable in Google Colab, with output saved in designated paths. But participants can run their codes on their own computers.
Don’t change the input
9 of 11
Sample Submission File
Submission Format: Participants must submit (1) a `.py` file containing virtual data generation functions that relate to “custom_virtual_data_generation” function and (2) the generated virtual data.
Required Details:
Keeping unchanged of the input and output of “custom_virtual_data_generation” function.
Save the virtual data in correct format (next slide).
File size of generated data located at “virtual” directory should be limited to 500M.
Compatibility: Need be executable in Google Colab, with output saved in designated paths. But participants can run their codes on their own computers.
Example of virtual data format (.csv file)
Accel. Left wrist
Accel. right wrist
Label
10 of 11
Evaluation and Judging Criteria
F1 Score as Core Metric: Virtual data quality evaluated by improvements in HAR model performance on test data.
Testing Setup: HAR model trained on generated data and tested using different random seeds with different test data of the OpenPack dataset.
Fairness Measures: All algorithms evaluated under the same conditions to ensure comparability.