Newspaper corpora
 Share
The version of the browser you are using is no longer supported. Please upgrade to a supported browser.Dismiss

 
Comment only
 
 
ABCDEFGHIJKLMNOPQRSTUVWXYZAA
1
Corpus nameLanguageNewspaperSizePeriodAnnotationAvailabilityLicenseFoundPublication
2
The Norwegian Newspaper CorpusNorwegian24 Norwegian newspapers1,000,000,0001998-presentUnclear; "multitagged"Concordancer???VLO
Andersen (2011)
3
SYN2006PUB: corpus of Czech newspapersCzech???300,000,0001989 and 2004tokenised, lemmatised, PoS-taggedDownload???VLO
4
SYN2013PUB: corpus of written Czech newspapersCzech???935,000,000???tokenised, lemmatised, PoS-taggedDownload???VLO
5
Romanian corpus of newspaper articlesRomanian???50,000,000???no annotationLINK BROKEN???VLO
6
Corpora of Newspaper TextsEnglish, Finnish, Swedish??????????????????VLO
7
Tübingen Treebank of Written German / Newspaper CorpusGermandie tageszeitung1,787,801???MSD, lemmatisation, syntactic cosntituency and dependencies, named entities, anaphora and coreference relationsConcordancer (institutional account required)RestrictedVLO
8
De Standaard CorpusDutchDe Standaard???2002-2003?????????VLO
9
TIGER CorpusGermanFrankfurter Rundschau900,000???PoS-tagged, annotated with syntactic structure, lemmatisationDownload???VLO
10
The Karjalainen CorpusFinnishKarjalainen, Joensuu???1990s?????????VLO
11
Corpora of Newspaper TextsSwedish, English, FinnishComputer corpora in Finnish, Swedish and English languages435,700,000??????Under negotiationUnder negotiationFIN-CLARIN
12
An-Nahar Newspaper Text CorpusStandard Arabic???1995 to 2000???UnavailableELRA END USERMETA-SHARE
13
BREF-80FrenchLe Monde??? (Note: speech corpus??????UnavaiableELRA END USERMETA-SHARE
14
Corpus of Contemporary Serbian Newpapers and Magazines BROKEN LINKSerbianVarious915,772,7082004-2012LemmatisationUnavailableCC-BY-NCMETA-SHARE
15
CRIPCO BROKEN LINKItalianL'Adige???1999-2006???UnavailableProprietaryMETA-SHARE
16
MLCC Multilingual and Parallel CorporaVariousVarious1986-1994??????UnavailableELRA END USERMETA-SHARE
17
MTP Annotated German corpus - tagged versionGermanDie Frankfurter Allgemeine Zeitung" and "Die Zeit"1992500,000MSDUnavailableELRA END USERMETA-SHARE
18
The Newspaper and Periodical Corpus of the National Library of Finland, Kielipankki VersionFinnish and SwedishVarious1770-20118,728,581,153???ConcordancerCC-BYFIN-CLARIN
19
The Newspaper and Periodical OCR Corpus of the National Library of Finland (1771-1874)Finnish and SwedishVarious1771-1874??????ConcordancerCC-BYFIN-CLARIN
20
The Swedish N-grams 1770-1940 of the Newspaper and Periodical Corpus of the National Library of FinlandSwedish, English, FinnishVarious1770-1940???N-gramsConcordancerCC-BYFIN-CLARIN
21
The Karelian Finnish Newspaper CorpusFinnishKarjalan Sanomat2012-2014500,000???In developmentUnder negotiationFIN-CLARIN
22
Zurich English Newspaper CorpusEnglishVarious 1661–17911,600,000???Seemingly available; need to contact author???Google
23
deu_newscrawl_2011GermanVarious2011425,703,278Concordancer???Google
24
Mannheim Corpus of Historical Newspapers and MagazinesGermanVariousHistorical various??????Download???LRE map
25
Trove Newspaper CorpusEnglishVarious??????Named entityConcordancer???LRE map
26
Europeana Newspapers NER CorporaDutch, French, GermanEuropeana newspapers??????Named entityDownload???LRE map
27
DN 1987SwedishDagens nyheter5,122,5901987
POS, semantic, dependency relations, compounds
Online & download
CC-BY, other
spraakbanken.gu.se
28
GP XXXXSwedishGöteborgsposten
271,924,622 tokens
1994, 2001-2011
POS, semantic, compounding, dependency relations
Online & download
CC-BY, other
spraakbanken.gu.se
29
Kubhist (historical newspaper corpus)SwedishVarious1,080,000,000 tokens1740ies-1920iesPOS, semantic, dependency relationsOnlineCC-BY, other
spraakbanken.gu.se
30
(Finland Swedish newspaper corpus)Finland SwedishVarious59,625,262 tokens1990-2014POS, semantic, dependency relations, compoundsOnlineCC-BY, other
spraakbanken.gu.se
32
8 sidorSwedish (easy Swedish)8 sidor678,738 tokens2003-2012POS, semantic, dependency relations, compoundsOnline & downloadCC-BY, other
spraakbanken.gu.se
33
WebbnyheterSwedishVarious271,806,921 tokens2001-2013POS, semantic, dependency relations, compoundsOnlineCC-BY, other
spraakbanken.gu.se
34
ChronoPress Corpus of Polish Press TextsPolishvarious20,000,000 tokens1945-1962Metadata: title(s), author, publication date, genre, support, circulation (official, "underground")
Text: POS, named entitles
OnlineCC-BY, otherCLARIN-PL
35
Delpher open newspaper archiveDutch (primarily)various351.104 newspapers1618-1876-downloadCC-BYwww.delpher.nl
36
Croatian Historic Newspapers
Croatian, German, Italian
????1789-1920???open access
37
Archivio La StampaItalianLa Stampa?????
38
Europeana Historic Newspapers
multiplemultiple??
Europeana Historic Newspapers Portal offers links (open access)
Europeana
39
British Library Nineteenth Century Newspapers Online
Englishmultiple??19th cent.
Restricted (institutional subscription)
http://www.bl.uk/reshelp/findhelprestype/news/newspdigproj/database/
40
Dagblad VooruitDutch Vooruit??1884-1918Open access AMSAB ftp://digital.amsab.be/pubs_serials/Vooruit_1884-1918/
41
The Belgian WAR PressDutch
1914-1918 and 1940-1945
online access via interface
Algemeen Rijksarchief https://warpress.cegesoma.be/nl/node/8940
42
AMSAB collectionDutch, Frenchseveral1800-nowhttps://www.amsab.be/collectie/digitale-bronnen
43
BelgicaPressDutch, Frenchseveral1831-1970Koninklijke Bibliotheek van Belgie http://opac.kbr.be/belgicapress.php
44
NederlabDutchseveralpre-1900CLARIAH http://www.nederlab.nl/
45
Event Registrymultiple?-present
Online service for retrieving and visualizing events and concepts etc. across news media
Jozef Stefan Institute http://www.eventregistry.org/
46
JPRESS- Historic Jewish Press
multiplemultiple??open access
National Library of Israel and Tel-Aviv University http://web.nli.org.il/sites/JPress/English/Pages/default.aspx
47
Archivio storico Corriere della Sera
Italian1876-2016
Restricted, paid subscription required
http://archivio.corriere.it/Archivio/interface/landing.html
48
Archivio storico de l'Unità
Italian1924-2008
Online access via interface to individual documents (pdf)
http://archivio.unita.it/
49
Archivio La RepubblicaItalian1984-2016
Online access via interface to individual documents
http://ricerca.repubblica.it/
50
I giornali del PiemonteItalian1846-2016
Online access via interface to individual documents
http://www.giornalidelpiemonte.it/
51
Corpus journalistique issu de l'Est RépublicainFrenchl'Est Républicain 1999-2003DownloadCC-ByVLO
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
Loading...
Main menu