Systematic Literature Review
 Share
The version of the browser you are using is no longer supported. Please upgrade to a supported browser.Dismiss

 
View only
 
 
Still loading...
ABCDEFGHIJKLMNOPQRSTUVWXYZ
1
ConferenceLink
Number of Paper Scanned
number of paper has "Natural Language" term?
Paper's google Link
mentioned/Used tool
TextJustificationCommentScanned
Threats to validity?
2
ICSE 2016
http://dblp.uni-trier.de/db/conf/icse/icse2016.html
10017
Automatic Model Generation from Documentation for Java API Functions
Stanford Parser
ramming rather than an adjective. But the Stanford parser would identify it as an adjective. In addition, the Stan- ford parser may incorrectly identify some words as nouns while in fact they should be marked as verbs. For example, “Returns” in the mentioned description is a verb, but it is incorrectly identified as a noun by the Stanford parser. Therefore, a POS restricting component is added to the Stanford parser to force it to use our pre-defined tags for some programming-specific words. Some pre-defined tags are as follows:
• noun: true/false/null • verb: returns/sets/maps/copies • adjective: reverse/next/more/empty
we leverage the state-of- the-art Stanford Parser with domain specific tags.
The research has selected Stanford parser without justification but they have added a POS restrictive component is added to the Stanford parser to force it to use our pre-defined tags for some programming-specific words.
YesNot available
3
Automatically Learning Semantic Features for Defect Prediction
NoneNoneNone
Mentioned only in references
Yes
4
Nomen est Omen: Exploring and Exploiting Similarities between Argument and Parameter Names
None
We believe that the second reason for false positives can be addressed by more sophisticated processing of names, such as Butler et al.’s method for tok- enizing identifier names [7] or techniques borrowed from the natural language processing community
None
Just suggesting to use general NLP but they have not use it in their research
Yes
5
The impact of test case summaries on bug fixing performance
None
We present TestDescriber a novel approach to gener- ate natural language summaries of JUnit tests. Test-
Describer is designed to automatically generate sum- maries of the portion of code exercised by each individual- dual test case to provide a dynamic view of the CUT
None
Mentioned NL but without mentioning a specific tool. Actually the Idea of the work is to announce a new natural language generator (TestDescriber) for Junit tests.
Yes
6
On the naturalness of buggy code
None
Our work begins with the observation by Hindle et al [22], that “natural" code in repositories is highly repetitive, and that this rep- edition can be usefully captured by language models originally de- veloped in the field of statistical natural language processing (NLP). Following this work, language models have been used to good ef- fect in code suggestion [22, 48, 53, 15], cross-language porting [38, 37, 39, 24], coding standards [2], idiom mining [3], and code de- obfuscation [47].

First, we converted each commit message to a bag-of-words and then stemmed the bag-of-words using standard natural language pro- cessing (NLP) techniques.
NoneYes
7
From Word Embeddings To Document Similarities for Improved Information Retrieval in Software Engineering
NoneNoneNone
The idea of this work was to bridging the lexical gap by projecting natural language statements and code snippets as meaning vectors in a shared representation space. In the proposed architecture, word embeddings are first trained on API documents, tutorials, and reference doc- uments, and then aggregated in order to estimate semantic similarities between documents. Empirical

But they have not mention any parsing tool and they have created their own parser using HTML .
Yes
8
Augmenting API Documentation with Insights from Stack Overflow
stanford Parser
We further configure the Stanford NLP parser [19] to automat- ically tag all code elements as nouns. In addition to code elements tagged with tt or code tags in the original source, all words that match one of about 30 hand-crafted regular expressions are treated as code elements.4 The resulting sentences are then parsed using the Stanford NLP toolkit.
NoneYes
No threats mentioned that are related the selected NLP tool
9
Program synthesis using natural language
Stanford Parser
We use machine learning techniques to obtain a classifier Cmap based on the part-of-speech (POS) tag pro- vided for the word by the Stanford NLP engine [26]. Cmap.Predict function of the classifier predicts the probability of each word-to- terminal mapping being correct.
NoneYesNot available
10
SWIM: Synthesizing What I Mean
C# parser from Roslyn
We then use the C# parser from Roslyn to parse the text (the text has been preprocessed before parsing, to correct obvious syntax errors), and determine whether it is a fragment of C# code. Finally, API names are extracted from the parsed code fragments. Besides code fragments, we also collect API names that are mentioned in the text.
NoneYesNot available
11
AntMiner : Mining More Bugs by Reducing Noise Interference
None
The natural language processing (NLP) technique is employed when extracting rules from comments and documentations. NLP is also helpful for methods mining rules from source code [15, 49]. We will leverage NLP to discover the semantic information behind the names of program elements (e.g. variables, functions). The information can be used to further im- prove
None
Just general mentioning of NLP
Yes
12
The challenges of staying together while moving fast: An exploratory study
NoneNoneNone
Mentioned only in references
Yes
13
Are "non-functional" requirements really non-functional?
None
we presented a research proposal with the goal of analyzing natural language NFRs taken from indus- trial requirements specifications to better understand their nature.
None
using the term natural language that means the actual language and not the processing part of it
Yes
14
Learning API Usages from Bytecode: A Statistical Approach
None
Hindle et al. [14] showes that source code is repetitive and predictable like natural language and they adopted n- gram model on lexical tokens to suggest the next token.
None
Just mentioned as part of the Related Work without the use in term of processing in the work.
Yes
15
Too long; didn't watch!: extracting relevant fragments from software development video tutorials
None
Then, we use an island parser [3,24] on the extracted text to cope with the noise, the imperfections of the OCR, and the incomplete code fragments. The island parser separates invalid code or natural language (water) from matching constructs (islands), and produces a Heterogenous Abstract Syntax Tree (H-AST) [34].
None
Not related to NLP
Yes
16
Work Practices and Challenges in Pull--Based Development: The Contributor`s Perspective
None
A contributor ex- presses the proposed change in natural language and the system searches i) the code base and ii) open or recently closed contribu- tions for similar changes. Then,
None
Not related to NLP
Yes
17
Finding and analyzing compiler warning defects
None
To tackle this challenge, for each compiler, we design specific parsers to extract computer-recognizable warning records from its natural language warning descriptions.
None
They have used their own parser
Yes
18
Toward a Framework for Detecting Privacy Policy Violation in Android Application Code
Light-weight NLP techniques
A second variant (referred to as “Keyword Search”) was used to
study the effectiveness of using light-weight NLP techniques, such as lemmatization, to filter out irrelevant paragraphs in privacy poli- cies
NoneYes
No threats mentioned that are related the selected NLP tool
19
ICSE 2015
http://dblp.uni-trier.de/db/conf/icse/icse2015-1.html
829
DASE: Document-assisted symbolic execution for improving automated software testing
*Stanford
We propose a new technique that combines natural language processing (NLP) techniques, i.e., grammar relationships
and heuristics, to automatically extract input constraints from documents. The technique is general and should be able to extract input constraints for purposes other than symbolic execution such as program comprehension and constraint verification. We study two types of documents, i.e., manual pages and code comments, and extract input constraints

We propose to use Stanford typed dependency [27] to
analyze the dependencies and grammatical relations among words and phrases in a sentence to handle these variants. Our technique
None
its about natural language processing techniques. They have mentioned the use of Stanford typed dependency to
analyze the dependencies and grammatical relations among words and phrases in a sentence to handle these variants. Our technique is different from prior work
Yes
Mention about the documentation type but no threats realted to the selected NLP tool
20
An Empirical Study on Real Bug Fixes
None
The two most common modified non-source files are configuration files and natural language documents.
None
Not related to NLP and it mention the natural language documents without the uses of any processing for them
Yes
21
Discovering information explaining API types using text classification
Stanford Parser
For calculating the wordNum feature, we first extract the lemmas of all words in the section using the Stanford toolkit [8].
NoneYes
No threats mentioned that are related the selected NLP tool
22
Graph-Based Statistical Language Model for Code
None
N-gram is popularly used for text analysis in natural language processing (NLP).
None
This paper presents GraLan, a graph-based statistical language model and its application in code suggest- tion.
Yes
23
A comparative study of programming languages in rosetta code
None
Rosetta Code is organized in 745 tasks. Each task is a
natural language description of a computational problem or theme, such as the bubble sort algorithm or reading the JSON data format.
None
Not related to NLP
Yes
24
Learning to log: Helping developers make informed logging decisions
None
it is challenging to effectively extract such context information, because the target code snippet is usually short and linguistically sparse compared to natural language text. To address this issue, we propose a novel feature extraction framework, as illustrated in Fig. 3, which involves three types of features: structural features, textual features, and syntactic
NoneYes
25
An information retrieval approach for regression test prioritization based on program changes
None
Since we are dealing with program source code rather than
natural language written in English, similar to other IR systems for software engineering, our tokenization is different than that of standard IR task. Generally
None
They have mentioned the tokenization but they didn't talk about the tool tool to do the tokenization
Yes
26
Cascade: A universal programmer-assisted type qualifier inference tool
None
The participants worked in a variety of areas such as high performance computing, natural language processing, security, mobile computing, compilers, and computer architecture.
None
The term natural language was mentioned as part of the an example of where would their work will be applied
Yes
27
Gray computing: An analysis of computing with background javascript tasks
None
Sentiment analysis [17] refers to using tech-
niques from natural language processing and text analysis to identify the attitude of a writer with respect to some source materials.
NoneYes
28
ICSE 2014
http://dblp.uni-trier.de/db/conf/icse/icse2014.html
9911
Checking app behavior against app descriptions
None
Before subjecting our descriptions to topic analysis, we applied standard techniques of natural language processing (NLP) for filter- ing and stemming.
None
They have used the NLP techniques but nothing mentioned about the processing tool
Yes
29
Effects of using examples on structural model comprehension: a controlled experiment
None
The constrains were written in natural language because we expected that most model receptors would not be familiar with the ob- ject constraint language (OCL) [38].
None
Just mentioning the term natural language in the context of the English language and not for processing purposes
Yes
30
Mind the gap: assessing the conformance of software traceability to relevant guidelines
None
The closest work is that by Dinesh et al., who used natural language processing (NLP) techniques to extract formal process representations from regulatory documents. They used these process representations to an- alyze an organization’s conformance to the regulation [17].
None
Mentioned part of the related work
Yes
31
Spotting working code examples
None
McMillan et al. [22] approach (Portfolio) supports programmers in finding relevant functions and their usage scenarios. Portfolio search is based on natural language processing and network analysis algorithms such as PageRank on call graphs.
None
Not related to NLP
Yes
32
CodeHint: dynamic and interactive synthesis of code snippets
NoneNoneNone
Mentioned the terms Natural Language as part of the references
Yes
33
Interpolated n-grams for model based testing
None
In turn, word predic- tion is a key component used to address several NLP tasks, such as speech recognition, handwriting recognition, ma- chine translation, spell correction, natural language gener- ation, etc [14, 10]. In fact, NLP algorithms admit usually multiple sentence derivations and N-gram statistics can be used to select themost likely among the possible derivations.
None
Not related to NLP
Yes
34
Comparing static bug finders and statistical prediction
None
The contrast between SBF and DP has parallels in the
Chomsky-vs-Norvig debate about statistical vs. first princi- ples approaches to natural language processing3. Chomsky tends to favor first-principles approaches to NLP, whereas Norvig argues that NLP should be based on the statistics of human linguistic behavior. Defects
None
Not related to NLP
Yes
35
Improving automated source code summarization via an eye-tracking study of programmers
None
Different source code summarization tools have generated natural language summaries, instead of keyword lists
None
its about natural language summarization
Yes
36
Understanding understanding source code with functional magnetic resonance imaging
None
We also envision stud- ies to explore the impact of natural programming languages [49] on comprehension, and how comprehension of natural languages, dead
None
Not related to NLP
Yes
37
Verifying component and connector models against crosscutting structural views
None
A unique feature of our work is the generation of small model witnesses and short natural language texts that formally justify and explain the verification results to the engineer.
None
Not related to NLP
Yes
38
The dimensions of software engineering success.
NoneNoneNone
Mentioned the terms Natural Language as part of the references
Yes
39
ICSE 2013
http://dblp.uni-trier.de/db/conf/msr/msr2013.html
9814
Automatic query reformulations for text retrieval in software engineering
None
These measures have been shown to correlate with the performance of queries in the field of natural language document retrieval [7] and in SE applications [8]. We selected four reformulation strategies proposed in the field of natural language document retrieval (see Section II for details), which perform best in that field and have never been used in SE, yet they are appropriate for SE data
None
Its about natural language document retrieval
Yes
40
How to effectively use topic models for software engineering tasks? An approach based on Genetic Algorithms
None
In all these approaches, LDA and LSI have been used on software artifacts in a similar manner as they were used on natural language documents (i.e., using the same settings, configurations and parameters) because the underlying as- sumption was that source code (or other software artifacts) and natural language documents exhibit similar properties
None
Its about natural language document retrieval
Yes
41
Categorizing Bugs with Social Networks: A Case Study on Four Open Source Software Communities
None
Due to the importance for practical software engineering, a
number of different approaches for the automated classification of bug reports have been studied, among them approaches based on the automated assessment of information provided by bug reports [1–4], natural language processing [5–7], the temporal dynamics of bug handling processes [8], coordination patterns [9], or the reputation of bug reporters [10–12]
None
using NLP to classify bug report
Yes
42
Efficient and change-resilient test automation: An industrial case study
None
A manual test case is a sequence of test steps written in natural language—each test step is essentially an instruction for the tester to perform an action on the application user interface or verify some visible state of the user interface.
None
Not related to NLP
Yes
43
User involvement in software evolution practice: A case study
None
Our interviews
show that especially user feedback written in natural language is a problem for developers.
None
Not related to NLP
Yes
44
Discovering essential code elements in informal documentation
ACE
ACE is not designed to capture conceptual references to code elements. Identifying indirectly referenced code elements is left to future work. One possibility would be to capture conceptual knowledge by analyzing the natural language text surrounding code elements.
None
They have used parser that is part of tool called ACE
YesNot available
45
Improving feature location practice with multi-faceted interactive exploration
None
In the future, we will apply more sophisticated techniques such as natural language summaries [23] and tag cloud [29] to generate more comprehensible descriptions for multi-faceted categories. It
None
Its mentioning NL as part of the related work and as part of future work.
Yes
46
Why don't software developers use static analysis tools to find bugs?
None
Ryan told us during his Interactive Interview that a start would be using “real words,” or a more natural language, to explain the problem.
None
Not related to NLP
Yes
47
Analysis of User Comments : An Approach for Software Requirements Evolution
None
As user comments are in natural language and users do not use the same exact wording to express common ideas, the recorded topics are consolidated and separated based on semantics determined by the second author
None
they have mentioned POS as future work but not for the current work "We have already filtered out some of the irrelevant words using a stop words list, but there are more domain specific words such as “good” or “needs”, that occur in many comments, but should not be used as classifiers. One approach could be to use Part-Of-Speech Tagging (POST) to identify nouns and verbs, and give them a higher weight in the classification than for example adjectives."
Yes
48
Beyond Boolean product-line model checking: Dealing with feature attributes and multi-features
None
One cannot refer to a precise element of a multiset in natural language. Our SPL behaviour specification language requires this capability; the definition of Michel et al. is thus inappropriate for our purpose.
None
Mentioned the term Natural Language as part of the background information but not used in action in the work
Yes
49
It's not a bug, it's a feature: How misclassification impacts bug prediction
NoneNoneNone
Mentioned only in references
Yes
50
Transfer Defect Learning
None
In terms of applications, transfer learning has been widely
studied to address cross-domain problems in text classifica- tion [15], [53], [54], natural language processing [25], [50], WiFi-based indoor localization [55], and computer vision [56]. In this study, we adapted transfer learning for cross-project defect
None
Not related to NLP
Yes
51
PorchLight: A tag-based approach to bug triaging
None
Duplicate detection techniques address this issue by
identifying similar bug reports and, in some cases, merging them into a single bug report. Some of the early approaches use natural language processing over bug text descriptions to identify possible duplicates [22,23]. More recent approaches attempt to improve over pure natural language processing by including analyses over runtime execution traces [24] or by incorporating clustering techniques [25].
None
Mentioned part of the related work
Yes
52
Inferring Likely Mappings between APIs
None
We interpreted Rosetta’s results by consulting API documentation. Such natural language documentation is inherently ambigious and prone to misinterpretation.
NoneYes
53
ICSE 2012
http://dblp.uni-trier.de/db/conf/msr/msr2012.html
8717
Inferring method specifications from natural language API descriptions
Stanford Parser
In partic- ular, for our evaluation, we used the Stanford Parser [17], which is a natural language parser to work out the gram- matical structure of sentences. The Stanford Parser parses a natural language sentence and determines POS tags asso- ciated with different words/phrases.
NoneYes
No threats mentioned that are related the selected NLP tool
54
Semi-automatically extracting FAQs to improve accessibility of software development knowledge
None
Frequently asked questions (FAQs) are a popular
way to document software development knowledge. As creating such documents is expensive, this paper presents an approach for automatically extracting FAQs from sources of software development discussion, such as mailing lists and Internet forums, by combining techniques of text mining and natural language processing.We apply the approach to popular mailing lists and carry out a survey among software developers to show that it is able to extract high-quality FAQs that may be further improved by experts
NoneYes
55
On the Naturalness of Software
* NLTK
Our natural language studies were based on two very
widely used corpora: the Brown corpus and the Gutenberg corpus.5 For code, we used two corpora, a collection of Java projects and a collection of applications from Ubuntu, broken up into application domain.
None
They have mentioned corpus that is taking from nltk but they have not talked about the tool itself in the processing
YesNot available
56
Recovering traceability links between an API and its learning resources BT - 34th International Conference on Software Engineering, ICSE 2012, June 2, 2012 - June 9, 2012
NoneNoneNone
Part of the related work
Yes
57
Statically checking API protocol conformance with mined multi-object specifications
NoneNoneNone
part of the references
Yes
58
Automating Test Automation
stanford Parser
This can be achieved by parts-of-speech (POS) tagging, a basic operation in a natural language parser, such as the Stanford Natural Language Parser [2]. Verbs have been highlighted in the preliminary segments shown in column 2.
NoneYesNot available
59
Recommending Source Code for Use in Rapid Software Prototypes
None
Source code search engines have been developed to locate implementations that are highly-relevant to a fea- ture specified by a programmer (e.g., via a natural-language query) [20], [24]. However,
None
Note that the term is written as "natural-language" and not as natural language
Yes
60
Integrated impact analysis for managing software changes
None
In several realistic settings, change requests are typically
specified in natural language
None
the approach used in this paper was to use a scenario-driven combination of
information retrieval (IR), dynamic analysis, and mining software repositories techniques (MSR).
Yes
61
Identifying Linux bug fixing patches
None
Wang et al. extract paraphrases of technical terms from bug reports [36]. There are also a number of techniques that trace requirements expressed in natural language, including the work of Port et al. [26], Sultanov et al. [31], etc.
IX.
NoneYes
62
Content Classification of Development Emails
None
Natural language text is often not well-formed and is interleaved with languages with other syntaxes, such as code or stack traces.
None
its about Natural language test
Yes
63
Detecting Similar Software Applications
None
A major contribution of our approach is that CLAN uses complete software applications as input, not only natural language queries. This feature is useful when a developer needs to find similar applications to a known software
None
its about natural language queries. This
Yes
64
A History-Based Matching Approach to Identification of Framework Evolution Sichen
None
To achieve this goal, we use a natural language parser to identify the verbs in the sentence and all the words associated with each verb.
Another paragraph
First, we split the entire comment into sentences. In
natural language processing, delimiter punctuation marks (e.g., ‘?’, ‘!’, and ‘.’) serve as separations of sentences. We also rely on these delimiters, but we need to handle the following two special cases.
NoneYes
65
Leveraging Test Generation and Specification Mining for Automated Bug Detection Without False Positives
None
We inspect all protocol violations manually to verify that
no false positives are reported. Indeed, all reported violations show an illegal API usage that can be triggered via the public methods of a class in the program. Furthermore, we manually check whether the comments of the buggy methods contain any natural language warnings about the exceptions that may result from these methods. None
NoneYes
66
Characterizing logging practices in open-source software
None
Developers should pay more attention to update the log messages as code changes. Tools combining natural lan- guage processing and static code analysis can be designed to detect such inconsistencies.
None
Not related to NLP
Yes
67
W HOSE F AULT : Automatic Developer-to-Fault Assignment Through
None
Existing work in the area of automatically determining
developer expertise generally falls into two categories: (1) those that leverage the natural-language bug reports in a bug- tracking system to assign a developer, and (2) those that can identify the most knowledgeable developer given a location in
None
Its mentioning NL as part of the related work and as part of future work.
Yes
68
Developer Prioritization in Bug Repositories
NoneNoneNone
part of the references
Yes
69
Where Should the Bugs Be Fixed ?
NoneNoneNone
part of the references
Yes
70
ICSME2015
http://dblp.uni-trier.de/db/conf/icsm/icsme2015.html
326
How can i improve my app? Classifying user reviews for software maintenance and evolution
Stanford Parser
we used the Stanford Typed Dependencies (STD) parser [13] which is a tool able to represent dependen- cies between individual words contained in sentences and to label each of them with a specific grammatical relation. It uses the Stanford Dependencies (SD) representation, which was successfully used in a range of downstream tasks, including Textual Entailments [12] and BioNLP [17], thus, becoming a de-facto standard for parser evaluation in English [7] [30].
Not clear justification
The aim of this work is to present an approach which uses Natural Language Processing, Sentiment Analysis and Text Analysis in order to detect and classify sentences in app user reviews that could guide and help app developers in accomplishing software maintenance and evolution tasks. The
Yes
No threats mentioned that are related the selected NLP tool
71
What are the characteristics of high-rated apps? A case study on free Android Applications
None
while rating an app. However, many users rate without giving comments, and accurate automated identification of sentiments from comments is still beyond the reach of state-of-the-art natural language processing tools, c.f., [
None
Not related to NLP
Yes
72
Modeling changeset topics for feature location
In-house tool
For our document extraction from a snapshot, we first parse
each Java file using our tool, Teaser, which is a text extractor implemented in Java using an open source Java 1.5 grammar and ANTLR v3. The tool extracts documents from the chosen source code entity type, either methods or classes. We consider interfaces, enumerations, and annotation types to also be a class.
NoneYes
Threats to internal validity include possible defects in our
tool chain and possible errors in our execution of the study procedure, the presence of which might affect the accuracy of our results and the conclusions we draw from them. We controlled for these threats by testing our tool chain and by assessing the quality of our data. Because we applied the same tool chain to all subject systems, any errors are systematic and are unlikely to affect our results substantially.
73
Developing a model of loop actions by mining loop characteristics from a large code corpus
None
Thus, our first task was to automatically find loops for
each feature vector that have descriptive comments associated with them. We call these comment-loop associations. As is known from natural language processing, we found that the verbs in the comments are not sufficient to characterize
None
This is what they have used i.e. Automatic generation of natural language summaries for java classes
Yes
74
Investigating Naming Convention Adherence in Java References
stanford Parser
The Stanford Parser [15] was used to identify the name’s phrasal structure using the technique we applied in previous work [13].
NoneYes
No threats mentioned that are related the selected NLP tool
75
To fix or to learn? How production bias affects developers' information foraging during debugging
None
Research has also investigated how to map from concerns (i.e., domain requirements such as jEdit’s text-folding functionality) to specific locations in the code [24], as well as how to analyze code and automatically generate natural lan- guage text describing the corresponding concerns [33]. Our results reiterate the potential value of such tools and emphasize the need for getting them into everyday practice by developers.
None
Not related to NLP
Yes
76
ICSME 2014 excluding Clones Refactoring managing Change 2
http://dblp.uni-trier.de/db/conf/icsm/icsme2014.html
409
Compositional vector space models for improved bug localization
None
As bug reports are often expressed in natural language,
many Information Retrieval (IR) techniques have been used to perform bug localization. The vector space model (VSM) with the standard tf-idf weighting scheme has been shown to outperform many other non-VSM information retrieval techniques (e.g., LDA, LSA, etc.) [15]. However, there are many well-known variants of VSM with different weighting schemes [16]. In this paper, we leverage this fact by composing VSMs with different tf-idf weighting schemes into a more effective composite model.
None
Its related to VSM more than to NLP
Yes
77
Clonepedia: Summarizing code clones by common syntactic context for software maintenance
NoneNoneNone
Mentioned only in references
Yes
78
Recommending clones for refactoring using design, context, and history
None
Natural Language Processing of Commit Logs Developers may mention refactoring of clones in com- mit logs of version control systems. In our study, we have also tried to use keywords matching to identify source code commits with clone refactoring, a method used in other studies of clone evolution [13], [45]. However, when we tried it, we found that this ap- proach missed the majority of clone refactorings in our sample; this is because developers often fail to provide textual descriptions that are precise and complete for each commit in commit logs. This phenomenon has also been observed by Parnin et al. [34] for code refactoring
None
This research about clone refactoring
Yes
79
Combining text mining and data mining for bug report classification
None
The contents of description (including discussions) and summary parts are indeed unstructured free texts written in natural languages, or even computer-generated stack traces. Nevertheless, the other fields usually have finite or well-defined values, for example, severity, priority, component, keywords, etc., and these fields form the structural aspect of the docu- ments. Therefore, the reports exhibit a mixed characteristic as
None
The research was about the text mining but there was no mention of the parser tool that was used and only they have mentioned that they have used The classification is pursued via certain text mining algorithms. In our study, we use Multinomial Naive Bayes Classifier and define three
levels of possibilities, i.e., {high, middle, low}. The output levels are regarded as the extracted features from these unstructured texts.
Yes
80
Mining API Usage Examples from Test Code
NoneNoneNone
Mentioned only in references
Yes
81
EnTagRec: An enhanced tag recommendation system for software information sites
stanford Parser
In this paper, we user the Stanford Log-linear Part-Of-Speech Tagger.8 To illustrate this extension, consider the words that appear in the software object shown in Figure 1. After this step, only the words “tutorial”, “eclipse”, “plugin”, “interface”, “function”, “java”, “app” remain.
Note
NoneYes
No threats mentioned that are related the selected NLP tool
82
Boosting {Bug}-{Report}-{Oriented} {Fault} {Localization} with {Segmentation} and {Stack}-{Trace} {Analysis}
NoneNoneNone
Mentioned only in references
Yes
83
An Exploratory Study on Self-Admitted Technical Debt
None
The majority of the aforementioned prior work used historical development data and source-code met- rics to perform their studies. More recently, researchers lever- aged natural language to help identify potentially problematic areas of the software. For example, work by Tan et al. [6] developed natural language processing tools to find comment- bug inconsistencies. Other work identified the coevolutionary relationship between source code and its associated comments (e.g., [7], [8]) and used task annotations to manage productiv- ity
NoneYes
84
An Empirical Study of the Effects of Expert Knowledge on Bug Reports
None
Note also that we do not use Latent Dirichlet Allocation (LDA) in our study – the reason is that even though LDA is widely-used for software artifacts [27], both STASIS and LSS have been shown to outperform LDA for computing the similarity short natural language documents [24], [25], such as bug reports.
B.
NoneYes
85
ICSME2013
http://dblp.uni-trier.de/db/conf/icsm/icsm2013.html
337
Content categorization of API discussions
None
Text categorization, automatically labeling natural language text with pre-defined semantic categories, is an essential task for managing the abundant online data. An example of such data in Software Engineering is the large, ever-growing volume of forum discussions on how to use particular APIs. We
NoneNoneYes
86
Enhancing software traceability by automatically expanding corpora with relevant documentation
None
Unlike rest of the source code, comments are written in
natural language by the programmers and as such should contain relevant dictionary words and phrases expressing high- level intents. So in RQ3, we inspect how significant is the impact of source code comments on traceability.
None
Not related to NLP
Yes
87
Improving feature location by enhancing source code with stereotypes
None
This is attractive because queries (to find particular) features can be made in the language of the documents (i.e., programming language terms, identifiers, and natural language of comments).
NoneNoneYes
88
Mining software profile across multiple repositories for hierarchical categorization
None
The software profiles are given in natural language which is not specific to any programming language. Thus, the profile extraction and pre-processing is simple.We deploy the crawler and extractor in a server (8*2.13G CPUs, 16GB RAM and 2TB storage) which is connected to the Internet with network bandwidth of 100M. Totally, it takes less than 3 days to get all the software profiles in Ohloh with a total size of about 100 MB for the extracted profile attributes
NoneNoneYes
89
Exploring the limits of domain model recovery
None
For future work, we can investigate if whether more automated natural language processing can help, however, remember our motivations for excluding automatic approaches in our current research method.
None
Mentioned as future plan and without much details
Yes
90
An automation-assisted empirical study on lock usage for concurrent programs
None
Lots of researches have been done on studying bug char-
acteristics in real world applications. Some of them concern about the common reasons and patterns of bugs. For example, a study [14] uses natural language text classification techniques to study around 29,000 bugs and shows the recent trends of bug characteristics. A
None
Mentioned part of the related work
Yes
91
Mining logical clones in software: Revealing high-level business and programming rules
None
Topics and Natural Language Summary: For each logi-
cal clone, our approach generates a set of topics from source code of all its instances using topic mining techniques. Fur- thermore, it generates a natural language summary based on a template defined according to the logical clone metamodel. This natural language summary provides an overview of the logical clone. For example, our approach generates a natural language summary for the logical clone shown in Figure 1 as follows. The
NoneNoneYes
92
ICSME2012
http://dblp.uni-trier.de/db/conf/icsm/icsm2012.html
478
Feature-gathering dependency-based software clustering using Dedication and Modularity
None
Semantic information such as identifiers and comments in source code are also used for more sophisticated natural language processing techniques [15][16]
None
Mentioned part of the related work
Yes
93
Triaging Incoming Change Requests : Bug or Commit History , or Code Authorship ?
Noneclass-level granularity (i.e.,
We parsed the source code of ArgoUML using the class-level granularity (i.e., each document is a class). After indexing with LSI, we obtained a corpus consisting of 1,449 documents and containing 5,488 unique words.
Change requests are typically specified in a free-form textual description using natural language (e.g., a bug reported to the Bugzilla system of a software project). It
NoneNoneYes
No threats mentioned that are related the selected NLP tool
94
Modelling the ‘ Hurried ’ Bug Report Reading Process to Summarize Bug Reports
None
To improve ℓtp, one could study the use of LDA, PLSA, or other natural language processing technique in which similarity is measured using topics
None
Mentioned part of the future work
Yes
95
When Would This Bug Get Reported ?
NoneNoneNone
Mentioned only in references
Yes
96
Search-based refactoring: Towards semantics preservation
None
Approximating the domain semantics with vocabulary is
widely used in several areas, e.g., information retrieval, natural language processing, and other related areas. In all these techniques, the semantic similarity between two entities indicates if
they share many common elements
(properties, words, etc.).
NoneNoneYes
97
The impact of bug management patterns on bug fixing: A case study of Eclipse projects
None
For instance, Anvik et al. [3] proposed an approach to assign a bug to an appropriate developer based on past bug reports data using natural language processing techniques.
None
natural language processing term was mentioned as an example of a work done by another authors
Yes
98
Relating requirements to implementation via topic analysis: Do topics extracted from requirements make sense to managers and developers?
None
Others within the RE community have leveraged natural language processing (NLP) to produce UML models [21]. NLP is also used in topic analysis
None
Mentioned as an example work not as an approach that was used in the work
Yes
99
Testing C++ generic libraries
None
The prevailing method of writing specifications for C++
generic algorithms and data structures today is structured natural language. Concepts are the prime component of speci- fications since they describe requirements on the interface, the behavior, and the invariants of the types that they constrain. In effect, concepts are reusable specifications that aggregate syntactic and semantic requirements usable in template con- straints. Concepts are traditionally documented using tables of valid expressions annotated with their preconditions, postcon- ditions, results, and other effects [14]. Such structured natural language description of concepts would be largely deprecated by language-supported concepts and axioms like those in the previous section.
NoneNoneYes
100
ICSME2011
http://dblp.uni-trier.de/db/conf/icsm/icsm2011.html
355
Generating natural language summaries for crosscutting source code concerns
None
This paper makes three contributions. First, it introduces
the concept of using generated natural language summaries of concern code to aid software evolution tasks. Second, it introduces a technique to produce such natural language summaries automatically from a description of the code that contributes to a concern. Third, it shows that such natural language summaries can help a programmer find pertinent code
None
The steps for the work are First, we extract structural and natural language information from the source code. Second, we apply a set of heuristics to the extracted information to find patterns and identify salient code elements in the concern code. Finally, using both the extracted information and the content produced from the heuristics, we generate the sentences that form the summary.
Yes
Loading...
 
 
 
Systematic Literature Review