1 of 17

Opening Sources in Breton Language:

Offering the “Minoritized” Language to the Majority

Dr. Tristan LOARER

CELTIC BLM

University Rennes 2 (Brittany)

2 of 17

Opening Sources in Breton Language: Offering the “Minoritized’’ Language to the Majority

Presentation plan

  • DEVRI: illustration of an open sources tool in Breton
    • Martial Ménard [1951-2016]
    • DEVRI: The diachronic dictionary
  • WIKImammenn: challenging the open sources
    • Digitize the corpora
    • Some good fresh news
  • Without open sources: no A.I.!
    • Feeding the Beast
    • Political issues for digital survival

Minorized language :

result of a voluntary process, product of linguistic colonialism and not “minority’’ as state of affairs, by essence or by nature of the languages ​​themselves.

3 of 17

I) DEVRI:

illustration of an open sources tool in breton

4 of 17

Opening Sources in Breton Language: Offering the Minoritized Language to the Majority

Martial MÉNARD [1951-2016]

  • breton activist
  • political prisoner
  • editor of An Here/The Harvest [1983-2003]
  • lexicologist

« We must take what we are entitled to, not hold out our hands. »

M. Ménard

5 of 17

Opening Sources in Breton Language: Offering the Minoritized Language to the Majority

Some elements of the lexicographer’s work:

    • Monolingual dictionary (1995), 1 232p.
    • Breton dictionary of erotism (1995), 528p.
    • French->Breton dictionary (2012), 1463p.
    • Monolingual dictionary (2001), 1 436p.
    • Breton chronicles (2021), 960p.

S.A.B.

Stourm ar brezhoneg (Struggle for the Breton language) is a a collective demanding the officialization of Breton (1980s)

6 of 17

Opening Sources in Breton Language: Offering the Minoritized Language to the Majority

Roparz HEMON [1900-1978]

  • founder of Gwalarn (1925-1944)
  • novelist, short story writer, poet
  • grammarian lexicologist
  • translator (Andersen, Marlowe, etc.)
  • A historical morphology and syntax of Breton

GWALARN/ (Northwest)

First monolingual literary review, then literary movement which emanated from it

7 of 17

Opening Sources in Breton Language: Offering the Minoritized Language to the Majority

P

69,333 entries

40 years of work

words from 1 100

to 2 010

8 of 17

II) WIKIMAMMENN:

Challenging the open sources

9 of 17

Opening Sources in Breton Language: Offering the Minoritized Language to the Majority

WIKImammenn (WIKIsource)

  • Already 10 000 pages in breton monolingual
    • 191 book
    • 552 index
    • 333 songs
    • 186 stories
    • 811 poems
    • 50 plays
    • 1 118 readable listenable music scores
    • written by 180 authors

10 of 17

Opening Sources in Breton Language: Offering the Minoritized Language to the Majority

Some good fresh news : Transkribus !

OCR: Optical Character Recognition

HTR: Handwritten Text Recognition

11 of 17

Opening Sources in Breton Language: Offering the Minoritized Language to the Majority

Some good fresh news : Transkribus !

  • D
    • M
    • T
  • W
    • D
    • O
  • F
    • D
    • F

Highlighted content

12 of 17

III) Without open sources: no A.I.

13 of 17

Opening Sources in Breton Language: Offering the Minoritized Language to the Majority

Feeding the Beast

14 of 17

Opening Sources in Breton Language: Offering the Minoritized Language to the Majority

    • unified language and spelling
    • grammar / dictionary
    • monolingual dictionary
    • oral historical corpus
    • written historical corpus
    • corrector
    • international translator (google.traduction)

    • Written recognition (on the way !)
    • LLM (Large Language Model)
    • vocal recognition
    • G.P.S.
    • automatic subtitling
    • chatbots
    • else …

Pragmatical tools for breton language TODAY

> still quite improvable !

15 of 17

Opening Sources in Breton Language: Offering the Minoritized Language to the Majority

Political issues for digital survival

  • "META has already created speech recognition systems for a thousand languages’’
  • "It would be enough to finance’’
  • "It is not the job of the Public Office of the Breton Language to work for a private company that earns billions every year’’
  • “We need free corpora, in line with compliance with the FAIR principles […] to operate this online distribution »
  • NLP
    • rich corpora
    • make it accesses / visible by LDC
    • fight for « digital surveyval »

No official status

= no strong linguistic policy

= no financial means

16 of 17

Opening Sources in Breton Language: Offering the Minoritized Language to the Majority

SOURCES

INTERVIEWS:

interview with Hervé Le Bihan, 18/07/2024

interview with Cédric Choplin, 17/09/2024

interview with Kristian Hamon, 18/09/2024

interview with Fañch Jestin, 18/09/2024

interview with Hervé Baudry 18-20/09/2024

WEBSITES:

ARBRES :

https://arbres.iker.cnrs.fr

Ressources numériques du breton u.t.d.:

https://entrelangues.modyco.fr/index.php/Breton#Ressources_num%C3%A9riques

Traitement automatique des langues -TAL- u.t.d.:

https://arbres.iker.cnrs.fr/index.php?title=Traitement_automatique_des_langues_-_Breton

Premier corpora Universal Dependencies (2018) :

https://github.com/UniversalDependencies/UD_Breton-KEB

Plateforme de classement de structures orales : COllections de COrpus Oraux Numériques u.t.d. :

https://cocoon.huma-num.fr/exist/crdo/

Initiative de collecte META :

https://ai.meta.com/research/no-language-left-behind/

Correcteur - traducteur Korvigelloù an Drouizig :

https://drouizig.org/br/

Kristian Hamon - Herve Baudry :

https://www.transkribus.org/model/breton-prints-17th-19th-centuries

Alan Enterm/Brendan-Budok Durand-Le Ludec : traducteur local (OPAB)

https://niverel.brezhoneg.bzh/fr/troer/

17 of 17

Thanks for your attention!

Get in touch with me:

Credits

For sources and license rights concerning the illustrations above, cf. the complete bibliography/websitography which follows (3p)

Dr. Tristan LOARER

CELTIC BLM

University Rennes 2 (Brittany)

https://perso.univ-rennes2.fr/tristan.loarer