1 of 16

UniDive WG3

Subgroup - Multilingual Tool and Resource Documentation

Co-leaders:

A. Seza Doğruöz (Ghent University)�Maria Giagkou (Athena RC)

Teresa Lynn (Mohamed bin Zayed University of Artificial Intelligence)

2 of 16

Task Overview

Task 1

  • Assess the “discoverability” of NLP tools and resources
  • Who can participate?
    • Everyone

Task 2

  • Analyse the NLP tool availability in the ELG catalogue
  • Who can participate?
    • Excel or Tableau enthusiasts
    • Those with skills in data visualisation

3 of 16

Task 1: Assessing the “discoverability” of NLP tools

  • Choose your language(s) and NLP task(s) of interest
  • Search for the relevant tools across a number of platforms
  • Report on the discoverability of desired tool/ resource �(Could you find easily it or not? What challenges?)
  • Report on the metadata information available (was it sufficient and accurate?)
  • What metadata do you recommend should be provided for a similar search?
  • Is there a tool/ resource you are aware of that you can’t find on these platforms?

4 of 16

E.g Search for Albanian Tools - ELRA Catalogue

5 of 16

E.g Search for Albanian Tools - CLARIN-SI Catalogue

6 of 16

E.g Search for Albanian Tools - ELG Catalogue

7 of 16

E.g Search for Albanian Tools - Hugging Face

8 of 16

Task 3.1.1: Assessing the “discoverability” of NLP tools

Process

  • A template will be provided with prompt questions
  • Additional input also desired

Outcome

  • An increased awareness and understanding of language technology platforms
  • Insight into limitations of current schemas
  • Honed research skills in searching for NLP tools/ resources
  • Recommendations for improving discoverability of tools/ resources

9 of 16

Task 3.1.1: Assessing the “discoverability” of NLP tools

10 of 16

11 of 16

Task 2: Tool Availability Analysis

  • Seeking volunteers with strong Excel/ Tableau skills
  • Analysis required on ELG catalogue export - similar to Kristina’s report as starting point
  • Prompts below can be the start of investigation - let’s see what else emerges:
  • The tools that certain languages are missing (e.g. Irish doesn’t have NER, Sentiment Analyser, etc)
  • The multilingual tool types that are lacking across languages �(e.g. NER is only available for X, Y, Z languages)
  • Which languages tend to be left out of “multilingual” tools?

12 of 16

ELG Catalogue Export

13 of 16

14 of 16

Task 2: Tool Availability Analysis

Expected Outcomes

  • A better insight into current NLP tool availability
  • A better insight into existing gaps and digital language inequality
  • A basis for improved reporting on language support or tool availability status � (visually/ written reports)

15 of 16

Questions/ Ceisteanna���- What if I want to add a missing entry to a platform?�- When does this task need to be completed by?�- How do I deliver my results?�- How will you know who the doc is from?

16 of 16

Thank you for your attention!�Go raibh maith agaibh!