ABCDEFGHIJKLMNOPQRSTUVWXYZAAAB
1
dextensionlanguagecountlow_alphanum_countlong_lines_countnon_lexable_count
XML_detected
Data_detectedNameOverall qualityAlphanum filterLong line filterLexer filterOther commentsIncludeAlphanum_thresholdLong_line_thresholdXML filterAlpha filterNear-dedup settings
2
1adsada10001220012HarmLGTMFalse positiveTBD; breaks space10.25100010.25
3
2adaada1000043102HarmLGTMLGTMLGTM10.2510001
4
0adbada10000185754HarmLGTMLGTMFalse positiveMostly xml; few false positives10.2510001
5
3agdaagda1000034901HarmLGTMLGTMLGTM10.2510001
6
4alsalloy1000013902HarmLGTMLGTMFalse positive10.2510001
7
5g4antlr10003044308HarmLGTM
https://docs.google.com/spreadsheets/d/1Lk-pTk_rXI__fCgixr7ZWSi8wR09Zzd2j_G90J80r00/edit?usp=sharing; lower to 0.2
LGTM10.2510001
8
144markdownmarkdown244032200Urvashisome false positives, should be filtered0 file remaining23 files. All look good10.25removeAdd language filter?1
9
7applescriptapplescript100003811309HarmLGTMLGTMLGTM10.2510001
10
145mkdmarkdown500000Urvashilooks good0 files remaining0 files remain10.25remove1
11
147mkdnmarkdown100000Urvashionly 1 file, looks good0 files remaining0 files remain10.25remove1
12
146ronmarkdown200000Urvashionly 2 eamples, can be excluded0 files remaining0 files remain10.25remove1
13
6scptapplescript10003225703HarmLGTMLGTMLGTM10.2510001
14
8asmassembly1000211500318EvgeniiLGTMSome false positives
Many similar false postives with a long comment line containing only numbers
10.25
Custom! Can we remove comments for files with long lines?
1remove
15
12awkawk100052255227HarmLGTMLGTMLGTM10.2510001
16
18batbatchfile10001371302EvgeniiLGTMSome are false positives10.2510001
17
270bbxtex1500001EvgeniiLGTM10.25remove1
18
17cmdbatchfile100001631403EvgeniiLGTMSome are false positives10.2510001
19
266instex44120010EvgeniiQuite a few misclassifications (data, assembly), can be filtered10.25remove1
20
271lbxtex2h0000EvgeniiLGTM10.25remove1
21
269mkiitex1200000EvgeniiLGTM10.25remove1
22
273mkivtex2700000EvgeniiLGTM10.25remove1
23
274mkvitex300000EvgeniiLGTM10.25remove1
24
272cbxtex600010EvgeniiLGTM10.25remove1
25
20bsvbluespec100037006HarmLGTMLGTMFalse positives, remove filter10.25remove1
26
264stytex61106020EvgeniiLGTM, few misclassificationsA few false positives10.25remove1
27
267dtxtex17401230EvgeniiLGTM, few misclassificationscan be increased10.25remove1
28
48cmakebluespec10000294100HarmLGTMLGTMFalse positives; Remove filter10.25remove1
29
21cc10005411015HarmLGTMFalse positive; but let's keep filterLGTM10.2510001
30
22hc10001319205EvgeniiLGTM
Non-lexable files have @property and similar in them, can be used to filter Objective-C headers
10.2510001
31
38csc-sharp1000023101QianLGTMLGTMLGTMFalse positive10.2510001
32
27ccc++100001300Zhihan ZhangLGTMLGTMLGTM10.2510001
33
25cppc++100023305Zhihan ZhangLGTM2 cases in total, they are false positives10.2510001
34
11
35
98augaugeas255000196EvgeniiLGTM, some minor misclassifications: xml or data10.25100010.25
36
28hppc++100017300Zhihan ZhangLGTMLGTMLGTM10.2510001
37
152wlmathematica686113503016EvgeniiLGTM
Small amount of non-passing legitimate examples, maybe should be tuned
10.25100010.25
38
42cljclojure10005109011HarmLGTMLGTMTBD; breaks space10.2510001
39
216spsscheme44629876425EvgeniiMany xmls, sql, and other non-scheme data10.25100010.25
40
44cljcclojure10000211104HarmLGTMLGTMLGTM10.2510001
41
233prcsql2302902Evgenii
LGTM, a couple of very large and likely autogenerated files, maybe worth filtering
10.25100010.25
42
43cljsclojure100022304HarmLGTMLGTMLGTM10.2510001
43
49coffeecoffeescript10002204717HarmLGTMLGTMLGTM10.2510001
44
50csoncoffeescript1000313014HarmLGTMLGTMLGTM10.2510001
45
54lispcommon-lisp10002213207HarmLGTMLGTMLGTM; removes auto-generated10.2510001
46
56lspcommon-lisp1000002803HarmLGTM10.2510001
47
55asdcommon-lisp100001660HarmLGTMLGTMLGTM;10.2510001
48
59csscss1000015427300Nour Fahmy25% not lexableLGTM
lots of examples are one singular line of code; recommend to remove filter as CSS code can be accordingly compressed
10.2510001
49
61cucuda100004201EvgeniiLGTMSome small amount of legit code is filterered10.2510001
50
60cuhcuda100013003EvgeniiLGTM10.2510001
51
62dartdart1000031700EvgeniiLGTM10.2510001
52
63dockerfile1000002900EvgeniiLGTM10.2510001
53
10a51assembly2800000EvgeniiLGTM10.2510001
54
9nasmassembly15900000EvgeniiLGTM10.2510001
55
16aukawk300300HarmLGTM10.2510001
56
13gawkawk2250110300HarmLGTMLGTMLGTM10.2510001
57
14mawkawk22001300HarmLGTMLGTMLGTM10.2510001
58
15nawkawk800100HarmLGTM10.2510001
59
68exelixir10000737901RaymondLGTMLGTMFalse positives mostly10.2510001
60
69exselixir1000025701RaymondLGTMLGTMLGTM10.2510001
61
70elmelm100021082016RaymondLGTMLGTMLGTM10.2510001
62
19bisonbison17601000HarmExclude (too many parse error file logs)00.2510001
63
71elemacs-lisp100013110903Marco ZoccaLGTMLGTM10.2510001
64
73erlerlang1000633509EvgeniiLGTM10.2510001
65
40cakec-sharp900801QianLGTMLGTMLGTM
Cake extensions cannot be well recoginzed. False positive
10.2510001
66
74hrlerlang10002813113EvgeniiLGTM10.2510001
67
39cshtmlc-sharp5850942900QianLGTMLGTMLGTMMostly html pages with some C# lex. False positive10.2510001
68
36c++c++1500000HarmLGTM10.2510001
69
78fsf-sharp10003133908Claire SchlesingerLGTMLGTMLGTMLGTM but most are false positives
Seems quite a few files end up being setting up types or other parameters, might be unhelpful when writing functions, but useful for more tasks like type inference.
10.2510001
70
31cpc++1000000Zhihan ZhangLGTMLGTMLGTM10.2510001
71
79fsxf-sharp10001323104Claire SchlesingerLGTMLGTMLGTMLGTM but most are false positives10.2510001
72
30cxxc++38902000Zhihan ZhangLGTMLGTMLGTM10.2510001
73
37h++c++100001HarmLGTM10.2510001
74
32hhc++19700502Zhihan ZhangLGTMLGTMLGTM10.2510001
75
81ffortran1000638559025Manan DeyLGTMLGTMLGTM10.2510001
76
34hxxc++14001000HarmLGTMFalse positive but let's keep as is10.2510001
77
29inlc++9100000Zhihan ZhangLGTMLGTMLGTM10.2510001
78
26ippc++2002000Zhihan ZhangLGTMLGTMLGTM10.2510001
79
33tccc++1900001EjiroLGTM10.2510001
80
35tppc++300000HarmLGTM10.2510001
81
45bootclojure121012300HarmLGTMLGTMLGTM10.2510001
82
82f90fortran10009114016Manan DeyLGTMLGTMLGTM10.2510001
83
89glslglsl1000222119011Manan DeyLGTMLGTMLGTM10.2510001
84
92shaderglsl100004590800EvgeniiRequires filteringMany are autogenerated but long lines are only comments
I've looked at few, they are not only glsl, e.g., Unity shaders
10.2510001
85
47cljxclojure800000HarmLGTMLGTMLGTM10.2510001
86
53_coffeecoffeescript400000HarmLGTM10.2510001
87
51cjsxcoffeescript18501800HarmLGTM10.2510001
88
52icedcoffeescript9200900HarmLGTM10.2510001
89
91vertglsl1000014402Manan DeyLGTMLGTMLGTM10.2510001
90
90fragglsl100012120214Manan DeyLGTMLGTMLGTM10.2510001
91
103gogo1000013011EvgeniiLGTM, small amount of autogenFalse positives: long comments or string constants10.2510001
92
104groovygroovy1000031602Zhihan ZhangLGTMLGTMfalse positives (long strings)10.2510001
93
58nycommon-lisp5205600HarmLGTMLGTM10.2510001
94
108hshaskell1000133963Zhihan ZhangLGTM
false positive (only 1 case in total, the script used too many indents)
false positives (only 2 cases in total)10.2510001
95
110htmlhtml10000240131143Zhihan ZhangLGTMLGTM
many false positives, since HTML does not require line breaks and many may compress their HTML code, I suggest remove the length limit for HTML
10.2510001
96
114idridris10001219501EvgeniiLGTM10.2510001
97
116thyisabelle100002039922EvgeniiLGTMSome false positives, some autogenerated10.2510001
98
117javajava1000031000Nour FahmyLGTMLGTMdata quality improves when max_length <= 10010.2510001
99
641dockerfile100000EvgeniiLGTM, only one file10.2510001
100
663dockerfile100000EvgeniiLGTM, only one file10.2510001