1 of 18

Being a CLARIN member

LINDAT/CLARIN

Jozef Mišutka

LINDAT/CLARIN

Institute of Formal and Applied Linguistics

Charles University in Prague, Czech Republic

2 of 18

Contents

  • CLARIN
  • LINDAT/CLARIN
  • Repository
  • Reasons for submitting/publishing your data

3 of 18

CLARIN

  • Common Language Resources and Technology Infrastructure
  • The ultimate objective is to advance research by giving researchers unified access to a platform which integrates language-based resources and advanced tools at a European level.

4 of 18

CLARIN centres

  • +1 in USA

5 of 18

LINDAT/CLARIN

  • LINDAT/CLARIN - CLARIN B centre in the Czech Republic
  • offer deposit of linguistic data and tools
  • language resource inventory of CLARIN
  • active in AAI, PIDs, FCS, legal

6 of 18

LINDAT/CLARIN repository

  • repository system based on DSpace 5
  • easy to find - support for VLO, OLAC, Data Citation Index by Thomson Reuter, OpenAIRE
  • easy to access - AAI
  • easy to share/cite - persistent identifiers, known citation format, submission workflow, licensing framework - traceable users, autocomplete
  • attractive for submitters - powerful statistics (admin only at the moment)
  • sustainable - powerful administration UI

7 of 18

Repository

8 of 18

Easy To Find

  • harvestable due to simple yet powerful metadata representation
  • input once, conversion done automatically
  • VLO - CLARIN virtual language observatory
  • OLAC - Open Language Archives Community
  • Thomson Reuters Data Citation Index
  • and others

9 of 18

10 of 18

Easy to access

  • users from around the world through federations
    • e.g., CLARIN.SI part of ARNES
  • and inter-federations
    • SPF
    • eduGAIN
    • CLARIN “homeless” IdP
  • local users are available too

11 of 18

Easy to share/cite

  • one persistent identifier (handle) also resolvable by URL
    • e.g., http://hdl.handle.net/11356/1025
    • tracking references
    • moving around
    • not changing the content
  • citation format according to RDA
  • simple step submission workflow (saved by each action)
  • autocomplete

12 of 18

Easy to share/cite

  • licensing framework
  • data/software should have licenses
  • question/answer based selection tool
  • strongly prefer CC licenses
    • 4.0 applicable for data
    • sui generis database right

13 of 18

Licenses

  • users can be forced to sign by licenses
    • authentication
    • one/every time
    • stores user id
    • additional information

14 of 18

Attractive

  • assets
    • findable
    • citeable
    • long term preservation
    • safer than local
    • curated
    • statistics

15 of 18

Repository Future

  • what to expect
    • localisation
    • even better statistics
    • better searching
    • integration with external services
      • ORCID, github

16 of 18

CLARIN

17 of 18

CLARIN project

  • Federated Content Search
    • searching in corpora
    • new version, new functionality
  • weblicht like services
    • OAI-PMH interface

18 of 18

LINDAT/CLARIN project

  • kontext corpus manager UI
    • more user friendly
    • conversion pipeline
  • services offered
    • unified
  • integrated into repository
    • corpora in repository -> corpora in search