ABCDEFGHIJKLMNOPQRSTUVWXYZAA
1
Corpus IDNameLink GitHub/ RepositoryCorpus Size Time PeriodText Sources AnnotationsLicencesMetadataRelevant Subcorpus Additional link 1Additional link 2Additional link 3Notes
2
1Open Greek and Latin Perseus Digital Library 5.0 (ongoing)https://www.opengreekandlatin.org/https://github.com/OpenGreekAndLatin?page=12,223 works
3,008 editions and translations (1,471 in Greek and 621 in Latin)
67.4 million words (29.9 million in Greek, 16.4 million in Latin)
P5 TEI XML
Compliant with Canonical Text Services (CTS)
https://scaife.perseus.org/
3
2Perseus Digital Library 4.0http://www.perseus.tufts.edu/hopper/https://github.com/PerseusDL163,851,126 wordsVII cent. B.C.- XXI cent. A.D.OAauthor, work title, URN, abbreviated title, work orginal language, edition or traslation year published, edition or traslation language, editor, translator, series, subjects Classics: 68,925,971 words
Arabic: 5,646,735 words

http://www.perseus.tufts.edu/hopper/collectionshttp://www.perseus.tufts.edu/hopper/opensource/download
4
3First1KGreek Corpushttps://opengreekandlatin.github.io/First1KGreek/https://github.com/OpenGreekAndLatin/First1KGreek23.366.087 wordsVIII B.C.- 250 A.D.
5
4LatinISEhttps://app.sketchengine.eu/#dashboard?corpname=preloaded%2Flatinwac2_213,180,571 tokens
11,036,900 words
2nd century B. C.- 21st century A. D.Texts from
LacusCurtius, 
Intratext 
and Musisque Deoque websites
Encoded in UTF-8, cleaned, deduplicated. Tagged by TreeTagger trained on Index Thomisticus TreeBank, Latin Dependency Treebank and Latin treeBank of Proiel Project.author, title, genre, era, date, century, book,
section, paragraph and line of verses
Romana Antiqua (VII-II cent. B. C.),
Romana Classica (I cent. B. C.),
Romana Postclassica (I-VI cent. A. D.),
Mediaevalis (VII-XIV cent. AD)


10.818.446 Tokens
https://www.researchgate.net/profile/Barbara_Mcgillivray/publication/236857134_Tools_for_historical_corpus_research_and_a_corpus_of_Latin/links/00b7d53c6306f2ab03000000/Tools-for-historical-corpus-research-and-a-corpus-of-Latin.pdf
6
5Past Masters http://library.nlx.com.proxy.uba.uva.nl:2048/xtf/search?browse-collections=true6th century B. C.- 21st century ADAnselm, Opera Omnia.
Aquinas, Collected Works
Aristotle, Complete Works (Greek and/or English)
Augustin, Opera Omnia CAG
The Latin Background 1100-1550 (Medieval England: Becket, John of Salisbury, Wyclif)
Ockham, Opera philosophica et theologica

Plato, Collected Works (Greek and/or English)
The Presocratic Writings
Scotus, Opera philosophica, Opera miscellania
7
6
The Digital Corpus for Graeco-Arabic Studies
https://www.graeco-arabic-studies.org/home.htmlhttps://github.com/Arithmeticus/graeco-arabic180 works by 28 authors
4.5 million
(1.2 million in Arabic and 3.3 million in Greek)
XML taggedauthor, language, subject/domain, text type, date Complete editions of Galen by Karl Gottlob Kühn (1821–1833), of Hippocrates by Émile Littré (1839–1861) and of Aristotle by Immanuel Bekker (1831). This is corpus of the project Greek into Arabic: http://www.greekintoarabic.eu/ . Additionally, this project has also this reseach unit: Glossarium Graeco-Arbicum http://telota.bbaw.de/glossga/ .
8
7Corpus Corporum http://www.mlat.uzh.ch/MLS/index.php?lang=0160 million words2nd cent. BC - 19th cent. ADIncludes texts from the entire Patrologia Latina, the Vulgate, Corpus Thomisticum TEI XML texts devided into subcorpora on specific topic
9
8The Diorisis Ancient Greek Corpushttps://brill.com/view/journals/rdj/3/1/article-p55_55.xml?language=en#:~:text=The%20Diorisis%20Ancient%20Greek%20Corpus%20is%20a%20digital%20collection%20of,semantic%20change%20in%20Ancient%20Greek.https://figshare.com/articles/dataset/The_Diorisis_Ancient_Greek_Corpus/6187256820 texts, 10,206,421 word tokensca 7th century bc - 5th century adTexts from (i) the Perseus Canonical Greek Literature repository, (ii) The Little Sailing” digital library, (iii) the Bibliotheca Augustana digital library. XML TEI, Lemmatized, PoS taggeddate, text-type (literary genre and sub-genre),
url of the source files, the identificators of ag authors and works from the tlg canon, names and roles of the persons involved in the preparation of the corpus
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100