1 of 5

A ML-LLM PAIRING FOR BETTER CODE COMMENT CLASSIFICATION

HANNA ABI AKL

DATA SCIENCETECH INSTITUTE

UNIVERSITÉ CÔTE D’AZUR, INRIA, CNRS, I3S

1

2 of 5

DATASET

  • Seed data provided by task organizers:
    • 11452 total rows of code-comment pairs with 2 labels (Useful and Not Useful)
    • Useful label: 7063 rows
    • Not Useful label: 4389 rows
  • Augmented data:
    • 421 total rows of code-comment pairs generated and labeled by ChatGPT
    • Useful label: 411 rows
    • Not Useful label: 10 rows
  • Synthetic data generated by SMOTE for over-sampling Not Useful label and restoring class parity

2

3 of 5

EXPERIMENTS

  • 2 experiments:
    • Experiment 1: Training on seed data
    • Experiment 2: Training on augmented data (seed + ChatGPT synthetic data)
  • 3 models:
    • Random Forest
    • Voting Classifier (Random Forest, Neural Network (Hidden Layer 1: 20 neurons, Hidden Layer 2: 10 neurons), Linear SVC)
    • Neural Network (Hidden Layer 1: 20 neurons, Hidden Layer 2: 10 neurons)
  • Cross-validation: 10 folds with 3 allowed repetitions

3

4 of 5

RESULTS

  • Classification Metrics:
    • Accuracy
    • F1
    • Precision
    • Recall
  • 1.5% increase in scores overall with ChatGPT augmented data

4

5 of 5

CONCLUSION

  • Prompt engineering can leverage capabilities of Large Language Models (LLMs) for data augmentation tasks
  • Control of diversity of generated data is still an open question
  • Future work will target integrating other data generation mechanisms into LLM prompting technique to build an enhanced data augmentation pipeline

5