TRACEABILITY TRANSFORMED: GENERATING MORE ACCURATE LINKS WITH PRE-TRAINED BERT MODELS�
Jinfeng Lin, Yalin Liu, Qingkai Zeng, Meng Jiang, Jane Cleland-Huang
TRACES, WHAT ARE THEY?
ARTIFACTS
NATURAL LANGUAGE ARTIFACTS
PROGRAMMING LANGUAGE ARTIFACTS
PAST WORK
THEIR SOLUTION/MODEL
T-BERT
Based on Google’s BERT (Bidirectional Encoder Representations from Transformers)
Used to generate trace links between natural language artifacts (NLA) and programming language artifacts (PLA)
TRAINING (3 PHASES)
Pre - training
Intermediate training
Finetuning
PRE-TRAINING
INTERMEDIATE TRAINING
THREE VARIANTS (INTERMEDIATE TRAINING)
Single
Twin
Siamese
TWIN
SIAMESE
SINGLE
FINETUNING
FINAL PIPELINE
The T-BERT model was implemented with PyTorch V.1.6.0 and HuggingFace Transformer library V.2.8.0.
DATASETS
CodeSearchNet
THEIR OWN NOVEL DATASET
ONLINE NEGATIVE SAMPLING (ONS)
TRAINING THE MODEL
Mean average precision (map@3)
Mean reciprocal rank (MRR)
Precision@k
F-Score
The F-1 score assigns equal weights to precision and recall, while the F-2 score favors recall over precision.
MRR and Precision@K ignore recall and focus on evaluating whether the search result can find interesting results for a user
A trace model with high Precision@K means users are more likely to find at least one related target artifact in the top K results
MRR focuses on the first effective results for a query, while ignoring the overall ranking.
evaluates the ranking of relevant artifacts over retrieved ones.
Quick word on the metrics
EVALUATION ON CODESEARCHNET
EVALUATION ON TRACEABILITY
RESEARCH QUESTIONS
RQ1: TWIN, SINGLE OR SIAMESE?�
Single seems to do better, but
RQ2: TRAINING TECHNIQUES TO BREAK THE GLASS CEILING? (ONS)�
RQ3: transfer knowledge?�
EVALUATION ON TRACEABILITY
LIMITATIONS