From Democracy to Data and Back
ParlaMint
June 27 2024 | CLaDA-BG, Sofia
Darja Fišer
CLARIN ERIC
Parliamentary corpora
What is CLARIN?
What is ParlaMint?
CLARIN flagship project:
Main deliverable:
ParlaMint I (2020-2021)
ParlaMint II (2022-2023)
WP1: Documentation, interoperability, metadata (Lead: Tomaž Erjavec (IJS), Matyáš Kopp (UFAL))
WP2: Corpus expansion (Lead: Tomaž Erjavec (IJS))
WP3: Corpus enrichment (Lead: Nikola Ljubešić (IJS) // Taja Kuzman (IJS), Paul Rayson (UCREL))
WP4: Engagement activities (Lead: Darja Fišer (INZ), Cagri Coltekin (TUB))
WP5: Coordination (Lead: Maciej Ogrodniczuk (IPI-PAN), Petya Osenova (IICT-BAS))
Harmonization of encoding
Element documentation
Git Management
Added taxonomies
Added metadata
⇒
Adding new corpora�(17 + 12 = 29 countries and autonomous regions)
Austria
Basque Country
Bosnia and Herzegovina
Belgium
Bulgaria
Catalonia
Croatia
Czech Republic
Denmark
Estonia
Finland
France
Galicia
Greece
Hungary
Iceland
Italy
Latvia
Netherlands
Norway
Poland
Portugal
Serbia
Slovenia
Spain
Sweden
Turkey
UK
Ukraine
Extending existing corpora
Preparing the corpora by partners
Validation and deployment
Validation was taken very seriously:
Deployment pipeline:
Data distribution
TEITOK - Corpus description
TEITOK - People view
TEITOK - Organizations view
TEITOK - Transcriptions view
TEITOK - Corpus search
Machine translation
USAS Semantic tagging
Key semantic categories used by female speakers in all parliaments
Multimodality
Four pilot speech corpora
Complicated:
Engagement activities
SHOWCASE 1: Networks of power
SHOWCASE 2: Emotions running high
Current work
V4.1 just out:
Submitting LREV paper:
Future directions
Thank you and see you at