JavaScript isn't enabled in your browser, so this file can't be opened. Enable and reload.

1 of 5

A ML-LLM PAIRING FOR BETTER CODE COMMENT CLASSIFICATION

HANNA ABI AKL

DATA SCIENCETECH INSTITUTE

UNIVERSITÉ CÔTE D’AZUR, INRIA, CNRS, I3S

1

2 of 5

DATASET

Seed data provided by task organizers:

11452 total rows of code-comment pairs with 2 labels (Useful and Not Useful)
Useful label: 7063 rows
Not Useful label: 4389 rows

Augmented data:

421 total rows of code-comment pairs generated and labeled by ChatGPT
Useful label: 411 rows
Not Useful label: 10 rows

Synthetic data generated by SMOTE for over-sampling Not Useful label and restoring class parity

2

3 of 5

EXPERIMENTS

2 experiments:

Experiment 1: Training on seed data
Experiment 2: Training on augmented data (seed + ChatGPT synthetic data)

3 models:

Random Forest
Voting Classifier (Random Forest, Neural Network (Hidden Layer 1: 20 neurons, Hidden Layer 2: 10 neurons), Linear SVC)
Neural Network (Hidden Layer 1: 20 neurons, Hidden Layer 2: 10 neurons)

Cross-validation: 10 folds with 3 allowed repetitions

3

4 of 5

RESULTS

Classification Metrics:

Accuracy
F1
Precision
Recall

1.5% increase in scores overall with ChatGPT augmented data

4

5 of 5

CONCLUSION

Prompt engineering can leverage capabilities of Large Language Models (LLMs) for data augmentation tasks
Control of diversity of generated data is still an open question
Future work will target integrating other data generation mechanisms into LLM prompting technique to build an enhanced data augmentation pipeline

5