JavaScript isn't enabled in your browser, so this file can't be opened. Enable and reload.

1 of 8

Assessing the Utility of C Comments with SVM and�Naïve Bayes Classifier

BY ANAMITRA MUKHOPADHYAY

COMPUTER SCIENCE AND ENGINEERING, IIT KHARAGPUR

1

2 of 8

Task Description

Binary Classification: Classify comments into ‘Useful’ and ‘Not useful’

Training general-purpose models for this task

SVM
Naïve Bayes Classifier

Improving performance by augmenting the dataset with Large Language Model

GPT-3.5-turbo

2

3 of 8

Methods Deployed

Support Vector Machine (SVM)

Based on finding the hyperplane that best divide data into classes.
Effective for high-dimensional spaces and non-linear data.

Naïve Bayes Classifier

Probabilistic classifier based on Bayes' theorem.
Assumes independence between features.

GPT-3.5-turbo

Large Language Model
Useful for ‘generation’ of text dataset

3

4 of 8

Results

4

5 of 8

Observations

Both models exhibit improved performance with augmentation

Improved F1 scores indicate better identification of useful comments

Lack of qualitative features might impact overall accuracy

5

6 of 8

Scope and Limitations

Scope and Methodology

Addressed binary classification of source code comments
Used Multinomial Naïve Bayes and SVM models
Extracted structural features: comment length, position, significant word ratio

Limitations

Reliance solely on structural features for classification
Potential insufficiency for qualitative analysis

6

7 of 8

Conclusion

Achieved commendable performance in comment classification

Acknowledged limitations pave the way for future enhancements

Balancing structural and qualitative features could refine classification accuracy

7

8 of 8

8

Thank You