1 of 8

Assessing the Utility of C Comments with SVM and�Naïve Bayes Classifier

BY ANAMITRA MUKHOPADHYAY

COMPUTER SCIENCE AND ENGINEERING, IIT KHARAGPUR

1

2 of 8

Task Description

  • Binary Classification: Classify comments into ‘Useful’ and ‘Not useful’

  • Training general-purpose models for this task
    • SVM
    • Naïve Bayes Classifier

  • Improving performance by augmenting the dataset with Large Language Model
    • GPT-3.5-turbo

2

3 of 8

Methods Deployed

  • Support Vector Machine (SVM)
    • Based on finding the hyperplane that best divide data into classes.
    • Effective for high-dimensional spaces and non-linear data.

  • Naïve Bayes Classifier
    • Probabilistic classifier based on Bayes' theorem.
    • Assumes independence between features.

  • GPT-3.5-turbo
    • Large Language Model
    • Useful for ‘generation’ of text dataset

3

4 of 8

Results

4

5 of 8

Observations

  • Both models exhibit improved performance with augmentation

  • Improved F1 scores indicate better identification of useful comments

  • Lack of qualitative features might impact overall accuracy

5

6 of 8

Scope and Limitations

  • Scope and Methodology
    • Addressed binary classification of source code comments
    • Used Multinomial Naïve Bayes and SVM models
    • Extracted structural features: comment length, position, significant word ratio

  • Limitations
    • Reliance solely on structural features for classification
    • Potential insufficiency for qualitative analysis

6

7 of 8

Conclusion

  • Achieved commendable performance in comment classification

  • Acknowledged limitations pave the way for future enhancements

  • Balancing structural and qualitative features could refine classification accuracy

7

8 of 8

8

Thank You