Welcome to the Language Bank of Finland!
This document is licensed under the Creative Commons Attribution 4.0 International license. Contents were produced by members of the Language Bank team in FIN-CLARIN (Mietta Lennes, Krister Lindén, Tero Aalto, Sam Hardwick).
Mietta Lennes, FIN-CLARIN�mietta.lennes@helsinki.fi
Today’s topic:
www.kielipankki.fi
2
9.6.2023
Program
9.6.2023
Introducing CLARIN
Video: https://youtu.be/lfDWBaaAcIw
5
https://www.kielipankki.fi
CLARIN ERIC
[ … ]
International cooperation
and sharing of resources�for Humanities and Social Sciences
European Research Infrastructure Consortium
founded on February 29, 2012
Member countries (21):�Austria
Bulgaria
Croatia
Cyprus
Czech Republic
Denmark
Estonia
Finland
Germany
Greece
Hungary
Iceland
Italy
Latvia
Lithuania
The Netherlands
Norway
Poland
Portugal
Slovenia
Sweden
Observers (3):
France
South Africa
United Kingdom
Third party (1):
CMU (USA)
Updated: 24.3.2021
Member countries (22):�Austria
Belgium
Bulgaria
Croatia
Cyprus
Czech Republic
Denmark
Estonia
Finland
Germany
Greece
Hungary
Iceland
Italy
Latvia
Lithuania
The Netherlands
Norway
Poland
Portugal
Slovenia
Sweden
Observers (2):
South Africa
United Kingdom
Third party (1):
CMU (USA)
Updated: 14.4.2022
CLARIN centres
CLARIN centres
Member countries (22):�Austria
Belgium
Bulgaria
Croatia
Cyprus
Czech Republic
Denmark
Estonia
Finland
Germany
Greece
Hungary
Iceland
Italy
Latvia
Lithuania
The Netherlands
Norway
Poland
Portugal
Slovenia
Sweden
Observers (2):
South Africa
United Kingdom
Third party (1):
CMU (USA)
Updated: 14.4.2022
��Kielipankki – �The Language Bank of Finland
�B-centre�FIN-CLARIN
www.kielipankki.fi
Users of the Language Bank
Researchers of the Month
FAIR data
Findable
Interoperable
Accessible
Re-usable
https://vlo.clarin.eu
https://www.clarin.eu/resource-families
How to locate and access resources
16
9.6.2023
www.kielipankki.fi
https://www.kielipankki.fi/corpora
www.kielipankki.fi
18
9.6.2023
https://www.kielipankki.fi/corpora
link to metadata
link to license
link to location
link to resource group page
citation instructions
Citation instructions
Remember to check the version of the resource!
Details about, e.g., Suomi24 versions can be found via the page of the resource group.
www.kielipankki.fi
21
9.6.2023
Resource group page (Wanca)
Multiple versions of the same resource
published in different means of publication
Resource-specific metadata
Access location (e.g., Korp, download service, or similar)
Persistent identifier (use this in citations!)
korp.csc.fi: Suomi24 (2001-2017)
Download service, http://www.kielipankki.fi/download
Downloadable VRT format, extracted from Korp
structural elements
structural attributes
positional attributes
Downloadable speech corpora:��Local use in Praat
Downloadable speech corpora:��Local use in ELAN
Virtual Language Observatory�(VLO) vlo.clarin.eu
www.kielipankki.fi
28
9.6.2023
CLARIN licence categories
Publicly available
Available for academic, logged in users
Personal permission is required for access
More detailed licence conditions
+BY author must be cited
+NC non-commercial use only
+ID login is required
+PLAN research plan is required
+PRIV contains personal data, users must follow the resource-specific data protection terms and conditions
+NORED redistribution is not allowed
+DEP modified versions can be deposited for reuse via CLARIN services
The Suomi 24 Sentences Corpus 2001-2020, Korp version:�Simple search: rakastaa (’to love’, verb)
The Suomi 24 Sentences Corpus 2001-2020, Korp version:�Simple search: rakastaa (’to love’, verb)
Word picture
Extended search:
rakastaa, followed by a direct object
Statistics: rakastaa + direct object
The Suomi 24 Sentences Corpus 2001-2020, Korp version:
Trend diagram: sairaus (’illness’)
The Suomi 24 Sentences Corpus 2001-2020, Korp version:
sairaus (’illness’) vs. korona (’corona’)
(showing approx. the years 2017-2020)
sairaus
korona
39
9.6.2023
Downloading a concordance from Korp
www.kielipankki.fi
40
9.6.2023
Plenary Sessions of the Parliament of Finland
www.kielipankki.fi
44
9.6.2023
Plenary Sessions of the Parliament of Finland:�maahanmuuttaja ’immigrant’ (all forms)
Plenary Sessions of the Parliament of Finland:�maahanmuuttaja ’immigrant’
Click on link to show video!
Plenary Sessions of the Parliament of Finland, Korp version 1.5: Link to video
Compile statistics from search results in Korp
Statistics:�any form of the word maahanmuuttaja (’immigrant’)�vs. speaker’s role and parliamentary group
50
9.6.2023
Researcher of the Month 3/2017
Professor of Finnish Sign Language, University of Jyväskylä
Research interests:
Tommi Jantunen
Corpus of Finnish Sign Language (CFinSL)
�University of Jyväskylä, Sign Language Centre (2019). Corpus of Finnish Sign Language [sign language corpus]. Kielipankki. Retrieved from http://urn.fi/urn:nbn:fi:lb-2019012321
The CFINSL corpus: elicited narratives
What kind of data can be deposited?
Contact FIN-CLARIN for details
fin-clarin@helsinki.fi
CLARIN license categories
Publicly available
Available for academic, logged in users
Personal permission is required for access
Language Bank Rights, https://lbr.csc.fi
More detailed license conditions
+BY author must be cited
+NC non-commercial use only
+ID login is required
+PLAN research plan is required
+PRIV contains personal data
+NORED redistribution is not allowed
+DEP modified versions can be redistributed via CLARIN�� and other resource-specific conditions,� if required
Start collecting metadata
57
Storage and backups
58
Intellectual Property Rights�e.g., copyright and related rights
Ask for permission
60
Personal data
61
Identify the Data Controller
62
Legal basis in scientific research
63
Personal data in special categories
64
Minimize the amount of personal data!
Further processing (e.g., for secondary research purposes)
66
The e-form for describing a new resource�for the Language Bank of Finland:
http://urn.fi/urn:nbn:fi:lb-2021121422
Two types of agreements are usually needed:
Obtain restricted access via �Language Bank Rights�https://lbr.csc.fi �
www.kielipankki.fi
Log in at https://lbr.csc.fi
Select resources
71
9.6.2023
Add to basket
www.kielipankki.fi
72
9.6.2023
Fill in the application
In case the corpus contains personal data, the license may include specific personal data protection terms and conditions.
License
Processing and approval
www.kielipankki.fi
75
9.6.2023
Guidelines for processing personal data
Publish a link �to the Privacy Notice�regarding your�processing of �the personal data and �inform the Language Bank:��https://urn.fi/urn:nbn:fi:lb-2022052522
List of tools available via the Language Bank�www.kielipankki.fi/tools
Tools:�Aalto-ASR��Automatic speech recognition and alignment��A demo tool�is also �available!
www.kielipankki.fi
Demo tools��https://www.kielipankki.fi/tools/demo/
9.6.2023
FAIR data
Findable
Interoperable
Accessible
Re-usable
FAIR data
Findable
Interoperable
Accessible
Re-usable
HRT / VRT
common formats
DOWNLOAD
Virtual Language Observatory
www.kielipankki.fi
Instructions for resource creators
Support for
versions and
variants
Long-term
archiving (?)
Common
processing
tools
Deposition
agreements
Language
Bank
Rights
Consistent
metadata
+ Access
location
PID
PID
Online courses
Corpus Linguistics and Statistical Methods (5 cr)
Introduction to �Speech Analysis (5 cr)
Data Clinic (5 cr)
The courses are open to all students and researchers
within and outside the University of Helsinki, even abroad.
Kiitos! Tack! Thank you!
www.kielipankki.fi
General support
fin-clarin@helsinki.fi
Technical support
kielipankki@csc.fi