| A | B | C | D | E | F | G | H | I | J | K | L | M | N | O | P | Q | R | S | T | U | V | W | X | Y | Z | AA | ||
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
1 | Multi layer sentiment analysis | http://thedatahub.org/en/dataset/mlsa | http://iggsa.sentimental.li/index.php/contact/ | CKAN level 2 | http://ckan.net/storage/f/file/12be3509-4cd1-4fc5-8c95-d0d3a7121766 | http://mlode-sparql.nlp2rdf.org/sparql | select distinct * where { ?s ?p ?o. ?s <http://www.w3.org/1999/02/22-rdf-syntax-ns#type> <http://asv.informatik.uni-leipzig.de/mlsa/PositiveWord>.} | corpus | 21000 | R | Sentiment analysis on German sentences | 1 | Wi | CC-BY-SA | @dropdown | |||||||||||||
2 | Multext-East | http://thedatahub.org/dataset/multext-east | http://nl.ijs.si/ME/V4/ | http://mlode-sparql.nlp2rdf.org/sparql | data (corpus) | other (TEI), L (scheme in OWL, Chiarcos) | parallel corpus of 15 eastern european languages + English and Farsi | 15 | academic | |||||||||||||||||||
3 | Wiktionary | http://ckan.net/package/wiktionary | http://wiktionary.org | no RDF | http://en.wiktionary.org/wiki/house | http://wiktionary.dbpedia.org/sparql | Dict | 200 Mio | in progress, Sebastian | Dictionary | 170 | Wortschatz, DBpedia | cc-by-sa | |||||||||||||||
4 | DBpedia | http://ckan.net/package/dbpedia | http://dbpedia.org/About | stable, from LOD | http://wiki.dbpedia.org/Downloads38 | http://dbpedia.org/resource/House | http://dbpedia.org/snorql | http://dbpedia.org | select distinct ?Concept where {[] a ?Concept} LIMIT 100 | http://graves.cl/visualRDF/?url=http%3A%2F%2Fdbpedia.org%2Fdata%2FBerlin.rdf | 1200000000 | LOD | 170 | 2000-us-census-rdf, dbtune-musicbrainz, education-data-gov-uk, eunis, flickr-wrappr, freebase, fu-berlin-dailymed, fu-berlin-dblp, fu-berlin-diseasome, fu-berlin-drugbank, fu-berlin-eurostat, fu-berlin-project-gutenberg, fu-berlin-sider, geonames-semantic-web, geospecies, italian-public-schools-linkedopendata-it, linkedgeodata, linkedmdb, nytimes-linked-open-data, opencyc, rdf-book-mashup, reference-data-gov-uk, revyu, tcmgenedit_dataset, transport-data-gov-uk, uk-legislation-api, w3c-wordnet, wikicompany, world-factbook-fu-berlin, yago | ||||||||||||||
5 | SFB632, QUIS-corpora | NOT FOUND | data (glosses) | other (PAULA) | Questionaire for Information Structure | 10/20/2011 | OLiA | open/t.b.a | ||||||||||||||||||||
6 | PropBank | NOT FOUND | corpus | other | (Palmer et al, 2005) approximately 113,000 annotated verb tokens. These verb tokens include all those occurring in over one million words of the Wall Street Journal section of the Penn Treebank | PennTreebank (WSJ) | closed, LDC-licensed | |||||||||||||||||||||
7 | Lexvo.org | http://ckan.net/package/lexvo | http://www.lexvo.org/ | CKAN level 2 | http://www.lexvo.org/page/term/eng/house | http://lod.openlinksw.com/sparql | Schema | LOD | language metadata (orthography, etc.) | ? | CC-BY-SA | |||||||||||||||||
8 | lingvoj.org | http://ckan.net/package/lingvoj | http://lingvoj.org | CKAN level 2 | http://mlode.nlp2rdf.org/downloads/mlsa.nt.gz | http://www.lingvoj.org/lang/fr | http://graves.cl/visualRDF/?url=http%3A%2F%2Fwww.lingvoj.org%2Flang%2Ffr | ? | LOD | ? | ? | |||||||||||||||||
9 | OPUS | http://opus.lingfil.uu.se/ | No RDF | data (corpus) | other | collection of parallel open source corpora | > 20 | OLiA | open (partly GPL, LGPL and others) | |||||||||||||||||||
10 | JRC-Acquis | NOT FOUND | http://langtech.jrc.ec.europa.eu/JRC-Acquis.html | data (corpus) | other | JRC-Acquis (European legislation text) | 21 EU languages | |||||||||||||||||||||
11 | PanLex | http://thedatahub.org/dataset/panlex | http://panlex.org | http://panlex.org/cgi-bin/plxl.cgi?lv=2&ex=438882 | other (lexical database) | 450 Mio | other (db) | database of translations among lexemes | 6,900 languages | links to about 3,600 other URLs | freely available | |||||||||||||||||
12 | Autotyp | NOT FOUND | Other | 200 Mio | proprietary | Typological Data from languages | All | closed | ||||||||||||||||||||
13 | Corpus of Historical American English (1810-2009) | NOT FOUND | http://corpus.byu.edu/coha/ | select distinct * where {<http://mlode.nlp2rdf.org/jrc-names/Muammar_Gaddafi> ?p ?o} | data (corpus) | other | Corpus of Historical American English (1810-2009) | American English | free | |||||||||||||||||||
14 | PROIEL | NOT FOUND | http://foni.uio.no:3000/ | data (corpus) | other (TIGER XML) | historical translations of the New Testament | Ancient Greek, Latin, Gothic, Old Church Slavic | OLiA, bible translations | cc attribution noncommercial sa | |||||||||||||||||||
15 | Arabic Corpora | NOT FOUND | http://aracorpus.e3rab.com/index.php?content=english | data (corpus) | other | Arabic Corpora | Arabic | free access | ||||||||||||||||||||
16 | SUSANNE, CHRISTINE, LUCY | NOT FOUND | data (corpus) | other | corpora by Geoffrey Sampson | British English | OLiA | no licence specified, downloadable | ||||||||||||||||||||
17 | Catalan WordNet | http://ckan.net/package/catalan-wordnet | http://nlp.lsi.upc.edu/web/index.php?option=com_docman&Itemid=135 | no RDF | ---- | ---- | ---- | ---- | ---- | ---- | LSR | ---- | other | Catalan WordNet | Catalan | ---- | GPL | |||||||||||
18 | Resnik's Bible corpora | NOT FOUND | http://www.umiacs.umd.edu/~resnik/parallel/bible.html | data (corpus) | other (CES) | several (mostly modern) translations of the Bible | Cebuano, Chinese, Danish, Early Modern English, Finnish, French, Greek (Koine), Indonesian, Latin, Spanish, Swahili, Swedish, Vietnamese | all bible translations | unclear (downloadable) | |||||||||||||||||||
19 | Danish Wordnet | NOT FOUND | http://www.wordnet.dk/ | LSR | other | Danish WordNet | Danish | open | ||||||||||||||||||||
20 | Cornetto | http://ckan.net/package/cornetto | http://www2.let.vu.nl/oz/cltl/cornetto/ | no RDF Dump | ---- | ---- | ---- | ---- | http://graves.cl/visualRDF/?url=http%3A%2F%2Fpurl.org%2Fvocabularies%2Fcornetto%2Fsynset-iets-2-noun.rdf | LSR | LOD | Dutch Wordnet | Dutch | Links to package:vu-wordnet and package:w3c-wordnet. | ||||||||||||||
21 | Wordnet (Princeton) | http://ckan.net/package/wordnet | http://semanticweb.cs.vu.nl/lod/wn30/ | http://wordnetweb.princeton.edu/perl/webwn?s=house | LSR | LOD | Engish | |||||||||||||||||||||
22 | ConceptNet | http://ckan.net/package/conceptnet | http://csc.media.mit.edu/conceptnet/get | http://conceptnet5.media.mit.edu/ no RDF | ---- | http://conceptnet5.media.mit.edu/web/c/en/house | ---- | ---- | ---- | ---- | LSR | ---- | other (db) | WordNet-like concept database | English | DBpedia, WordNet | GPL / CC-by | |||||||||||
23 | cornell Movie dialog corpus | NOT FOUND | http://www.cs.cornell.edu/~cristian/Cornell_Movie-Dialogs_Corpus.html | corpus | 304713 utterances | other | metadata-rich collection of fictional conversations extracted from raw movie scripts | English | LSRs | unclear, downloadable | ||||||||||||||||||
24 | Corpora of misspellings | NOT FOUND | http://www.dcs.bbk.ac.uk/~ROGER/corpora.html | other (orthographic DB) | other (DB) | word lists of misspelled words, can be used in to learn orthographic rules orfor spellingcorrection | English | with English LSRs or corpora | unclear, downloadable | |||||||||||||||||||
25 | LCS Database | NOT FOUND | http://www.umiacs.umd.edu/~bonnie/LCS_Database_Documentation.html | LSR | other (lexical conceptual structures, LCS) | verbal semantics | English | WordNet | distributable with attribution | |||||||||||||||||||
26 | Manually Annotated Sub-Corpus (MASC) | http://thedatahub.org/en/dataset/masc/ | http://www.anc.org/MASC/ | http://www.anc.org/MASC/download/MASC-1.0.3.zip | corpus | other | Manually annotated american corpus | english | wiktionary, dbpedia | other (open) | ||||||||||||||||||
27 | Name List | NOT FOUND | http://nlp.cs.qc.cuny.edu/ngram_genderanimacy.zip | LSR | other | name lists with gender and animacy information discovered from Google n-grams (version II) (Ji and Lin, 2009) | English | OLiA ? | ||||||||||||||||||||
28 | OCAS | NOT FOUND | http://idocument.opendfki.de/wiki/Evaluation/Corpus/OlympicGames2004 | data (corpus) | R | semantically annotated corpus | English | |||||||||||||||||||||
29 | SemCor Corpus | NOT FOUND | http://multisemcor.fbk.eu/semcor.php | Corpus | other | Sense-Tagged Corpora | English | ? | ||||||||||||||||||||
30 | Sentiment-annotated quotation corpus | NOT FOUND | http://langtech.jrc.ec.europa.eu/JRC_Resources.html | data (corpus) | other (Excel) | Sentiment-annotated quotation corpus | English | OLiA ? | free + attribution | |||||||||||||||||||
31 | Verb Semantics Ontology | NOT FOUND | http://www-csli.stanford.edu/~arunm/ | LSR | other (Prolog, CSV) | another ontology of verb semantics, includes Roget's Thesaurus | English | no license statement, downloadable | ||||||||||||||||||||
32 | Wordnet ( W3C ) | http://ckan.net/package/w3c-wordnet | http://www.w3.org/TR/wordnet-rdf/ | http://graves.cl/visualRDF/?url=http%3A%2F%2Fwww.w3.org%2F2006%2F03%2Fwn%2Fwn20%2Finstances%2Fwordsense-entity-noun-1 | http://graves.cl/visualRDF/?url=http%3A%2F%2Fwww.w3.org%2F2006%2F03%2Fwn%2Fwn20%2Finstances%2Fwordsense-entity-noun-1 | LSR | LOD | English | ||||||||||||||||||||
33 | Link Grammar (Parser) | none yet | http://www.link.cs.cmu.edu/link/ | parser for English, includes a dictionary; here, we are only considering the dictionary | morpho-syntactic dictionary | 60K word forms | other | English | English corpora/LSRs | GPL-compatible | ||||||||||||||||||
34 | Wikicorpus | NOT FOUND | http://www.lsi.upc.edu/~nlp/wikicorpus | data (corpus) | other | Wikicorpus, v. 1.0 | English, Catalan, Spanish | Wordnet, OLiA (POS) | cc | |||||||||||||||||||
35 | English-Persian Parallel Corpus | NOT FOUND | http://ece.ut.ac.ir/NLP/resources.htm | data (corpus) | other | Mohammad Taher Pilevar, NLP Lab, University of Tehran, Iran. | English, Farsi | (no annotations yet) | free (no licence specified) | |||||||||||||||||||
36 | NunavutHansard | NOT FOUND | http://www.inuktitutcomputing.ca/NunavutHansard/en/index.html | data (corpus) | other | English-Inuktitut parallel corpus (morphological annotations can be obtained with http://www.inuktitutcomputing.ca/Uqailaut/en/IMA.html) | English, Inuktitut | downloadable, academic | ||||||||||||||||||||
37 | GeoWordNet | http://ckan.net/package/geowordnet | http://geowordnet.semanticmatching.org/ | no RDF Dump | http://geowordnet.semanticmatching.org/synset-dependent_political_entity-noun-1.rdf | ---- | ---- | http://graves.cl/visualRDF/?url=http%3A%2F%2Fgeowordnet.semanticmatching.org%2Fsynset-dependent_political_entity-noun-1.rdf | dict | 53390969 | R | GeoWordNet is a semantic resource built from the full integration of WordNet, GeoNames and the Italian part of MultiWordNet. | English, Italian | geonames-semantic-web, vu-wordnet | CC-BY http://creativecommons.org/licenses/by/3.0/ | |||||||||||||
38 | TEP: Tehran English-Persian Parallel Corpus | not yet | http://ece.ut.ac.ir/NLP/resources.htm | English-Persian parallel corpus, subtitles | corpus (unannotated) | 4 mio tokens per language | other | English, Persian (Farsi) | Farsi LSR, English LSR | Usage of this package for any research or non-commercial purposes requires the precondition that you cite the following paper: M. T. Pilevar, H. Faili, and A. H. Pilevar, “TEP: Tehran English-Persian Parallel Corpus”, in proceedings of 12th International Conference on Intelligent Text Processing and Computational Linguistics (CICLing-2011). | ||||||||||||||||||
39 | French TimeBank | http://ckan.net/package/french-timebank | https://gforge.inria.fr/projects/fr-timebank/ | no RDF | ---- | ---- | ---- | ---- | ---- | ---- | corpus | other (XML) | The French TimeBank consists of a set of 109 journalistic articles from 7 different sub-genres annotated according to the ISO-TimeML standard, adapted for the French language. Eventualities (events and states) and temporal expressions (dates, durations, frequencies, quantified intervals) are marked up with in-line annotation. | French | LGPL (LGPL-LR) | |||||||||||||
40 | Le Petit Prince | NOT FOUND | http://www.unlweb.net/unlarium | data (corpus) | other (UNL) | Le Petit Prince | French | cc-by-sa | ||||||||||||||||||||
41 | Perceo Corpus | NOT FOUND | http://cnrtl.fr/corpus/perceo/ | corpus | 100,000 tokens | other | a collection of lemmatized and POS-tagged spoken French transcriptions. The data contains over 100,000 tokens automatically tagged and manually checked. | French | French LSRs | freely downloadable | ||||||||||||||||||
42 | WOLF | NOT FOUND | http://alpage.inria.fr/~sagot/wolf-en.html | LSR | other | French WordNet | French | Cecill-C license (LGPL compatible) | ||||||||||||||||||||
43 | Wiktionary RDF dump | NOT FOUND | http://kaiko.getalp.org/dbnary/static/ | LSR | R | yet another wiktionary rdf dup | Germa, Eglish, Finnish, French, Itlian, Polish | Wi | as wiktionary(no explicit statement) | |||||||||||||||||||
44 | Deutsches Morphologie-Lexikon | non yet | http://www.danielnaber.de/morphologie/ | ---- | ---- | ---- | ---- | ---- | ---- | ---- | LSR (morphology only) | other | German ophological lexicon | German | with German LSRs | CC-BY | ||||||||||||
45 | SentiWS | http://thedatahub.org/dataset/sentiws | http://asv.informatik.uni-leipzig.de/download/sentiws.html | CKAN level 2 (minimal) | http://mlode.nlp2rdf.org/downloads/sentiws.ttl.gz | http://mlode-sparql.nlp2rdf.org/sparql?default-graph-uri=http%3A%2F%2Fmlode.nlp2rdf.org&query=select+distinct+*+where+%7B%3Chttp%3A%2F%2Fmlode.nlp2rdf.org%2Fsentiws%2Fword%2FAbd%C3%A4mpfung%3E+%3Fp+%3Fo%7D+LIMIT+100&format=text%2Fhtml&timeout=0&debug=on webstore_last_updated webstore_url | http://mlode-sparql.nlp2rdf.org/sparql | ---- | select distinct * where {<http://mlode.nlp2rdf.org/sentiws/word/Abdämpfung> ?p ?o} LIMIT 100 | http://graves.cl/visualRDF/?url=http%3A%2F%2Fmlode.nlp2rdf.org%2Fsentiws%2Fword%2FAbd%C3%A4mpfung | Corpus | 30339 | other | SentimentWortschatz, or SentiWS for short, is a publicly available German-language resource for sentiment analysis, opinion mining etc. | German | Wiktionary 4233 | downloadable, cc-by-nc-sa | |||||||||||
46 | GermaNet | NOT FOUND | LSR | other | German wordnet | German | academic | |||||||||||||||||||||
47 | Grammis | NOT FOUND | http://hypermedia.ids-mannheim.de/pls/public/ontologie.html | Schema | other (SQL) | grammis ontologie, ontology of linguistic terminology, German | German | closed, browseable | ||||||||||||||||||||
48 | NEGRA corpus | NOT FOUND | data (corpus) | R (chiarcos) | German newpaper corpus | German | OLiA | annotations: academic, source text: proprietary | ||||||||||||||||||||
49 | Open Thesaurus | NOT FOUND | http://www.openthesaurus.de/export/OpenThesaurus-Textversion.zip | LSR | other (txt) | German thesaurus | German | LGPL | ||||||||||||||||||||
50 | Salsa | NOT FOUND | data (corpus) | other | framenet-annotations for TIGER | German | FrameNet, TIGER | annotations: academic, source text: proprietary | ||||||||||||||||||||
51 | TIGER corpus | NOT FOUND | data (corpus) | R (hellmann) | German newpaper corpus | German | OLiA | annotations: academic, source text: proprietary | ||||||||||||||||||||
52 | TüBa-D/Z corpus | NOT FOUND | http://www.sfs.uni-tuebingen.de/en/de_tuebadz.shtml | data (corpus) | other | German newpaper corpus | German | OLiA, GermaNet | annotations: academic, source text: proprietary | |||||||||||||||||||
53 | Linguee German-English dictionary | NOT FOUND | http://www.linguee.com/downloads/completeDict-latin9.txt | dictionary | other | German-English dictionary | German, English | German and English LSRs | GPL | |||||||||||||||||||
54 | SMULTRON | NOT FOUND | http://www.cl.uzh.ch/kitt/smultron/ | data (corpus) | other | Stockholm MULtilingual TReebank) is a parallel treebank which contains around 1000 sentences in English, German and Swedish | German, English, Swedish | OLiA | academic | |||||||||||||||||||
55 | Anatolian word lists | NOT FOUND | http://ferheng.org/?Daxistin | dict | other | pair-wise word-lists of the languages in column H, partly with parts of speech | German, Turkish, English, Soranî (Kurdish), Kurmanci (Kurdish), Kurdi (Kurdish), Swedish, Czech | GPL | ||||||||||||||||||||
56 | Haitian Creole Lang Data, Carnegie Mellon | NOT FOUND | http://www.speech.cs.cmu.edu/haitian/ | data (corpus) | other | Haitian Creole spoken and text data | Haitian | (no annotations yet) | minimal restrictions | |||||||||||||||||||
57 | Hebrew WordNet | NOT FOUND | http://cl.haifa.ac.il/projects/mwn/index.shtml | LSR | other | Hebrew WordNet | Hebrew | "free download" | ||||||||||||||||||||
58 | Hindi WordNet | NOT FOUND | http://www.cfilt.iitb.ac.in/wordnet/webhwn/ | LSR | other | Hindi WordNet | Hindi | open source (GNU FDL) | ||||||||||||||||||||
59 | IcePaHC | NOT FOUND | http://www.linguist.is/icelandic_treebank/Download | No RDF | data (corpus) | other | Icelandic Parsed Historical Corpus | Icelandic | OLiA | open (LGPL) | ||||||||||||||||||
60 | Inuktitut - A Multi-dialectal Outline Dictionary | NOT FOUND | http://www.inuktitutcomputing.ca/Spalding/en/spalding.shtml | dict | other (HTML) | Inuktitut dictionary | Inuktitut, English | downloadable, no licence statement | ||||||||||||||||||||
61 | LSG ("Líonra Séimeantach na Gaeilge") | NOT FOUND | http://borel.slu.edu/lsg/ | LSR | other | Irish WordNet | Irish Gaelic | "free download" | ||||||||||||||||||||
62 | Japanese WordNet | NOT FOUND | http://nlpwww.nict.go.jp/wn-ja/index.en.html | LSR | other | Japanese WordNret | Japanese | open | ||||||||||||||||||||
63 | Multi-Lingual Semantic Network | NOT FOUND | http://two.dcook.org/software/mlsn/about/download.html | LSR | other | Japanese/Chinese/German/English WordNet | Japanese/Chinese/German/English | free download | ||||||||||||||||||||
64 | Arabic Online Commentary Dataset v1.1 | NOT FOUND | http://www.cs.jhu.edu/~ozaidan/AOC/AOC_readme.txt | corpus | 52 mio words | other | Arabic newswire, April-Oct 2010 | Jordanian, Saudi and Egyptian Arabic | LSRs (if any for Arabic available, Wiktionary ?) | unclear, downloadable | ||||||||||||||||||
65 | Datahub/ CKAN URL | URL | Comment | RDF Dump | Example URL (Linked Data?) use "house" if possible | sparql url | graph | demo query | visualisations | Type | Triple size (guessed) | Status: R = RDF, LOD = in official LOD Cloud, L = some Links exist | Domain | Languages | Possible links (Wi = Wiktionary, Wo = Wortschatz, D = DBpedia) | Licence | ||||||||||||
66 | Macedonian WordNet | NOT FOUND | http://www.time.mk/trajkovski/papers/is2010.pdf | LSR | other | Macedonian WordNet | Macedonian | Creative Commons, Attribution-NonCommercial 3.0 Unported license | ||||||||||||||||||||
67 | ODIN | NOT FOUND | http://www.csufresno.edu/odin/odin-overview.html | corpus (glosses) | other | glosses, es gab mal pläne, das mit GOLD zu verbinden | many | "open", no licence specified | ||||||||||||||||||||
68 | BabelNet | NOT FOUND | http://lcl.uniroma1.it/babelnet/ | LSR | other | alignment of WordNet with multilingual Wikipedia categories | multilingual | DBpedia, WordNet | downloadable | |||||||||||||||||||
69 | TMC: Tehran Monolingual Corpus | none yet | http://ece.ut.ac.ir/NLP/resources.htm | Persian corpus | corpus (unannotated) | 250M words | other | Persian (Farsi) | Persian LSR | "freely available" (research/NC only ?), To have a copy of this corpus contact us at: t.pilevar {at} ece.ut.ac.ir or nlp {at} ece.ut.ac.ir | ||||||||||||||||||
70 | Bijankhan corpus | none yet | http://ece.ut.ac.ir/DBRG/Bijankhan/ | morphosyntactically annotated Persian corpus, dependency syntax to be released (http://stp.lingfil.uu.se/~mojgan/persian_dependency_treebank.pdf) | corpus | 2.6M tokens | Persian (Farsi) | Farsi LSRs | All rights of this corpus and the tools that are included in this package are reserved for University of Tehran - Database Research Group. Usage of this package for any research or non-commercial purposes is free with the precondition that you cite the related papers below. | |||||||||||||||||||
71 | Persian Link Grammar | none yet | http://www.ling.ohio-state.edu/~jonsafari/persianlg/ | see English Link Grammar | Persian (Farsi) | Persian corpora, LSRs | unclear, same as English Link Grammar ? | |||||||||||||||||||||
72 | Hamshahri CLEF corpus | none yet | http://ece.ut.ac.ir/dbrg/hamshahri/ | Hamshahri collection is a standard reliable Persian text collection that was used at Cross Language Evaluation Forum (CLEF) during years 2008 and 2009 for evaluation of Persian information retrieval systems. | corpus (not annotated) | 318K documents | Persian (Farsi) | Persian LSRs and dictionary | All rights of the corpus' news are reserved for Hamshahri newspaper. All rights of the corpus' data and the tools that are included in this website are reserved for University of Tehran - Database Research Group. Usage of this package for any research or non-commercial purposes is free with the precondition that you cite paper number [1] of publications section. | |||||||||||||||||||
73 | Persian Dependency Treebank (PerDT) | none yet | http://dadegan.ir/en/perdt | corpus (dep-parsed) | 30K sentences | Persian (Farsi) | Persian LSRs and dictionary | to be checked, research only (?, http://aclweb.org/aclwiki/index.php?title=Resources_for_Persian) | ||||||||||||||||||||
74 | Persian Treebank (PerTreeBank) | none yet | http://hpsg.fu-berlin.de/~ghayoomi/PTB.html | corpus (HPSG-parsed) | 1000 sentences | Persian (Farsi) | Persian LSRs and dictionary | "freely available" (pers. comm., Masood Ghayoomi, Jan 17, 2013) [= research only?] | ||||||||||||||||||||
75 | diverse VOA corpora | none yet | http://www.ling.ohio-state.edu/~jonsafari/corpora/ | harvested from Voice of America | corpora (unannotated) | Persian (Farsi), Urdu, Pashto, Dari | Farsi morphosyntactic annotations, Farsi LSR | public domain (see www.voanews.com) | ||||||||||||||||||||
76 | plWordNet | NOT FOUND | http://www.plwordnet.pwr.wroc.pl/main/?lang=en | LSR | other | Polish WordNet | Polish | academic | ||||||||||||||||||||
77 | Punjabi Morphology, corpus and lexicon | NOT FOUND | http://www.lama.univ-savoie.fr/~humayoun/punjabi/index.html | data (corpus), dict | other | Punjabi Morphology, corpus and lexicon | Punjabi | free | ||||||||||||||||||||
78 | Russian WordNet | NOT FOUND | http://wordnet.ru/ | LSR | other | Russian WordNet | Russian | free download | ||||||||||||||||||||
79 | The Manually Annotated Sub-Corpus | NOT FOUND | http://www.anc.org/MASC | corpus | other (XCES, GrAF) | subcorpus of the ANC | s | |||||||||||||||||||||
80 | Corpus of Modern Scottish Writing (CMSW) | NOT FOUND | http://www.scottishcorpus.ac.uk/cmsw/ | data (corpus) | other | Corpus of Modern Scottish Writing (CMSW) | Scots | freely available | ||||||||||||||||||||
81 | African Bibles | NOT FOUND | http://visionneuse.free.fr/index.htm?version=BIB | data (corpus) | other (XML) | various Bibles, mostly from African languages, represent a parallel corpus (although not announced as such) | sentisenti | any other Bible corpus | unclear, downloadable | |||||||||||||||||||
82 | Apertium project lexicons | NOT FOUND | http://sourceforge.net/projects/apertium/ | dict | other | Apertium project lexicons | several | GPL | ||||||||||||||||||||
83 | sloWNet | NOT FOUND | http://lojze.lugos.si/~darja/slownet.html | LSR | other | Slovene WordNet | Slovene | Creative Commons License (attribution, non-commercial, share-alike) | ||||||||||||||||||||
84 | SPLLOC | NOT FOUND | www.splloc.soton.ac.uk, www.talkbank.org | data (corpus) | other (CHILDES) | corpus of oral L2 Spanish, universities of Southampton, Newcastle, and York in the UK. | Spanish | OLiA (POS tags) | academic | |||||||||||||||||||
85 | TamilWordNet | NOT FOUND | http://www.nrcfosshelpline.in/code/wiki/TamilWordnet | LSR | other | Tamil WordNet | Tamil | open source | ||||||||||||||||||||
86 | Asian Wordnet | NOT FOUND | http://www.asianwordnet.org/ | LSR | other | Asian WordNet | Thai ,Korean, Japanese, Indonesian, Myanmar, Vietnamese, Mongolian, Bengali | open (BSD) | ||||||||||||||||||||
87 | JRC-Names | http://thedatahub.org/dataset/jrc-names | http://langtech.jrc.it/JRC-Names.html | CKAN level 2 (minimal) | http://mlode.nlp2rdf.org/downloads/jrc-names.ttl.gz http://mlode.nlp2rdf.org/downloads/jrc-names-links.nt.gz | http://mlode.nlp2rdf.org/jrc-names/Muammar_Gaddafi | http://mlode-sparql.nlp2rdf.org/sparql | http://thedatahub.org/dataset/jrc-names | select distinct * where {<http://mlode.nlp2rdf.org/jrc-names/Muammar_Gaddafi> ?p ?o} | http://graves.cl/visualRDF/?url=http%3A%2F%2Fmlode.nlp2rdf.org%2Fjrc-names%2FMuammar_Gaddafi | 1458828 | R | highly multilingual named entity resource for person and organisation names | Too many | DBpedia | http://langtech.jrc.ec.europa.eu/Resources/LICENCE-EULA_JRC-Names_2011.pdf | ||||||||||||
88 | baby names | NOT FOUND | http://www.nyc.gov/html/doh/downloads/pdf/public/press09/pr076-09-babynames.pdf | word list | ethnicity- and gender-classified first names | US | corpora | unclear, downloadable | ||||||||||||||||||||
89 | US Census name lists | NOT FOUND | http://www.census.gov/genealogy/www/data/1990surnames/index.html | word list | 5500 first names, several thousand last names | other (CSV) | 1990 first names, female and male 1990 last names 2000 last names 2000 last names (Spanish) can be used for gender detection and NER | US | corpora | unclear, downloadable | ||||||||||||||||||
90 | Printed Book Auction Catalogues | http://thedatahub.org/en/dataset/printed-book-auction-catalogues | http://keithalexander.co.uk/pbac | ?? | http://keithalexander.co.uk/pbac/identified/agents/405.rdf | ??? CKAN url not working, but might have one | --- | --- | ||||||||||||||||||||
91 | Intercontinental Dictionary Series | http://thedatahub.org/dataset/ids | http://lingweb.eva.mpg.de/ids/ | CKAN level 2 | http://mlode.nlp2rdf.org/downloads/ids.nt.gz | http://mlode-sparql.nlp2rdf.org/sparql | ||||||||||||||||||||||
92 | Lemon Wiktionary | http://thedatahub.org/en/dataset/lemonwiktionary | http://monnetproject.deri.ie/lemonsource/wiktionary__en (down) | CKAN level 2 | http://monnetproject.deri.ie/lemonsource/Special:Dump/wiktionary.tar.bz2 | |||||||||||||||||||||||
93 | Lemon Wordnet | http://thedatahub.org/dataset/lemonwordnet | http://monnetproject.deri.ie/lemonsource/wordnet | CKAN level 2 | http://monnetproject.deri.ie/lemonsource/Special:Dump/wordnet.zip | http://monnetproject.deri.ie/lemonsource/wordnet/house-noun | http://monnetproject.deri.ie/lemonsource_query/ | http://graves.cl/visualRDF/?url=http%3A%2F%2Fmonnetproject.deri.ie%2Flemonsource%2Fwordnet%2Fcat-noun.rdf | dict | R | WordNet | |||||||||||||||||
94 | Open Data Thesaurus | http://thedatahub.org/dataset/open-data-thesaurus | http://vocabulary.semantic-web.at/PoolParty/wiki/OpenData | http://vocabulary.semantic-web.at/PoolParty/sparql/OpenData | LSR | RDF | Thesaurus | |||||||||||||||||||||
95 | VU WordNet | http://thedatahub.org/en/dataset/vu-wordnet | http://semanticweb.cs.vu.nl/lod/wn30/ | CKAN level 2 | http://eculture.cs.vu.nl/git/public/?p=vocs/wordnet.git;a=tree;f=rdf;hb=HEAD | http://semanticweb.cs.vu.nl/europeana/lod/purl/vocabularies/princeton/wn30/synset-house-noun-1.rdf | ?? | |||||||||||||||||||||
96 | WALS | http://thedatahub.org/dataset/wals | www.wals.info | CKAN level 2 | http://mlode.nlp2rdf.org/downloads/wals.nt.gz | http://wals.info/languoid/lect/wals_code_hau | http://mlode-sparql.nlp2rdf.org/sparql | Other | typological database | |||||||||||||||||||
97 | Zhishi-me | http://thedatahub.org/en/dataset/zhishi-me | CKAN Level 2 | n/a | http://zhishi.me/data/zhwiki/resource/Shanghai | http://zhishi.me/sparql | ||||||||||||||||||||||
98 | RKB Explorer Wordnet | http://thedatahub.org/en/dataset/rkb-explorer-wordnet | http://wordnet.rkbexplorer.com/ | CKAN level 3 | http://wordnet.rkbexplorer.com/models/dump.tgz | http://wordnet.rkbexplorer.com/id/synset-odd-toed_ungulate-noun-1 | http://wordnet.rkbexplorer.com/sparql/ | -- | ||||||||||||||||||||
99 | Sanskrit English Lexicon | http://thedatahub.org/en/dataset/sanskrit-english-lexicon | http://blog.kasabi.com/about/ | Data available, but not online, huge project, unsure, what to do with this | ||||||||||||||||||||||||
100 | SIMPLE Ontology, Lexicon | http://thedatahub.org/en/dataset/simple-ontology-lexicon | In RDF, Not hosted | http://www.languagelibrary.eu/owl/simple/simpleindividuals.owl |