Suggested Datasets, Resources, Collaborations
 Share
The version of the browser you are using is no longer supported. Please upgrade to a supported browser.Dismiss

 
View only
 
 
Still loading...
ABCDEFGHIJKLMNOPQRSTUVWXYZ
1
TitleBrief DescriptionSuggester Name (optional)Resource Locations
2
Code Completion CorpusIt would be nice to have a corpus of actual code completion queries, rather than guessing than any token could be queried. Might require instrumentation of an IDE to collect.
3
Rosetta CodeA repository for multi-lingual code snippets solving the same problem, with natural language descriptions.
http://rosettacode.org/wiki/Rosetta_Code, https://github.com/mono/shootout-benchmarks
4
coding competitionstudents/hobbyists submit solutions to problems in different languages
https://code.google.com/codejam, http://www.spoj.com/, https://uva.onlinejudge.org, https://www.jutge.org/
5
Not-English corporaeProjects that aren't written in English as the primary language, with artifacts that are not primarily English. Sonia had a student looking at this -- sourceforge -- found 2-3 projects with non-English identifiers. Are there others?
6
single natural language sentence to single line of codeparallel natural language to imperative code (java, python, c, etc.)Nate
Pseudogen Python-English Corpus (18,000 lines, manually created: http://ahclab.naist.jp/pseudogen/)
7
single natural language sentence to 4-7 lines of codeparallel natural language to imperative code (java, python, c, etc.)Nate
8
single natural language sentence to full code functionparallel natural language to imperative code (java, python, c, etc.) average function size of more than 10 linesNate
9
paragraph of text to single line of codeparallel natural language to imperative code (java, python, c, etc.) 4-7 sentence paragraphs -- this is probably refering to api type function calls which do something complexNate
10
paragraph of text to 4-7 lines of codeparallel natural language to imperative code (java, python, c, etc.) 4-7 sentence paragraphsNate
AutoComment Java-English Corpus (120,000 automatically extracted and filtered comment/snippets from StackOverflow: http://asset.uwaterloo.ca/AutoComment/)
11
paragraph of text to full code functionparallel natural language to imperative code (java, python, c, etc.) average function size of more than 10 linesNate
12
full natural language spec to full program -- class final project sizeparallel natural language to imperative code (java, python, c, etc.) approx a few hundred lines of codeNate
13
full natural language spec to full program -- production sizeparallel natural language to imperative code (java, python, c, etc.) --- thousands of lines of codeNate
14
input specificationsinput format specification ==> parsing code; paper by regina barzilay taking acm programming contests, looking at code; most are really difficult... perhaps too much for current synthesis? though input parser may be easy enough. ~10K such pairsPercy
ACM programming contests (http://people.csail.mit.edu/regina/my_papers/prog13.pdf)
15
dataset of NL contracts and codeNL contracts in Java API doc and the corresponding method body with preconditions and assertions Martin Monperrushttp://www.monperrus.net/martin/api-directives
16
comment classification and scopelabel comments in code according to some rough categorizations (e.g., this is a literal description of what's happening, this is a high level description of what the code does, this is a code contract, this is a justification for design decisions) and the subset of code that the comment refers todanny
17
well commented codebasesgoogle projects: angular, react from facebook, hack from facebookpatrick wagstrom
18
dataset of commits1.7 million commits of Apache projects: message and changed filesMartin Monperrushttps://github.com/monperrus/apache-svn-commits
19
aligned c#/java projects[tien nguyen]
20
Domain documents/RequirementsIndustrial collaborators have requested ways to analyze large sets of domain documents -- and to build models that would help them understand the domain and provide foundations for identifying requirements. (I have to think more about this). many are unfortunately proprietary. regulatory codes are freely available; might be a starting pointJane
21
SQlite ReqsAbramSQLite http://www.sqlite.org/requirements.html
22
iTrust ReqsiTrust http://agile.csc.ncsu.edu/iTrust/wiki/doku.php
23
eTour ReqseTour http://www.cs.wm.edu/semeru/tefse2011/Challenge.htm
24
CM-1 ReqsCM-1 http://promise.site.uottawa.ca/SERepository/datasets-page.html and zip
25
TraceabilityMore traceability data http://www.coest.org/index.php/for-researches1
26
Really well commented projectsCode listings of instructional operatiing systems such as PINTOS are used along with other course materials in undergraduate/graduate OS courses in major universities and so the code is really well commented.Gaganhttp://www.cse.iitd.ernet.in/~sbansal/os/xv6-rev8.pdf
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
Loading...
 
 
 
Data Requests
Lonely SE seeks NL
Lonely NL seeks SE