1 of 65

SIL Converters

There’s more than meets the eye!

1

2 of 65

What will be covered

  • What is it, and where did it come from? Brief history
  • Why is it even useful? “What can it do for me?”
  • Overview of the Converters (Transduction Engines)
    • Setting up a Convertor
  • Demo of SIL Converters used in various tools
    • Installation tips
  • What else is included with SIL Converters?
  • Finding help,and getting support

2

3 of 65

3

4 of 65

Brief History

4

5 of 65

Where it all began

Humble beginnings �back in May 2002�

“What we need is �Consistent Changes” �within Microsoft Word…

Done!

5

6 of 65

The end of the road for legacy encodings

Driven by a sense of�urgency as legacy�non-standard �encoding issues �started to bite us…

“..stops working with XP,�so we need Unicode-�based solutions NOW!”

6

7 of 65

How it grew over the years

  • CC table (Consistent Changes)
  • TECkit maps (more powerful than CC and even bidirectional)
  • Data Conversion (Word) Macro - very popular as it helped convert complex documents into Unicode without losing the formatting.
  • Bulk Word Document Converter - great for churning through lots of docs
  • Bulk SFM Converter - great for multiple scripture files (in Paratext)
  • ICU Converters and Transliterators
  • Daisy-Chained converter - for multiple sub-steps in the conversion process
  • Perl ● Python ● Regular Expressions

7

8 of 65

Conceptual Framework

A very simple concept.

Programs send data to any�one of the converters, which �process it and send it back.

8

  • Data Conversion Macro
  • Bulk Word Doc Converter
  • Bulk SFM Converter
  • etc.
  • CC (Consistent Changes)
  • TECkit
  • ICU Converters
  • ICU Transliterators
  • Perl
  • Python
  • Regular Expressions

9 of 65

Conceptual Framework

9

  • Data Conversion Macro
  • Bulk Word Doc Converter
  • Bulk SFM Converter
  • etc.
  • CC (Consistent Changes)
  • TECkit
  • ICU Converters
  • ICU Transliterators
  • Perl
  • Python
  • Regular Expressions

10 of 65

What can it do for me?

10

11 of 65

What can it actually do?

  • Integrated in other software:
    • Paratext
    • FLEx
    • AdaptIt
    • OneStory Editor
    • MS Office (Word, Excel, Publisher, Access)
    • Open/Libre Office (Writer, Calc)
  • Bulk Word Doc Converter
  • Bulk SFM Converter
  • XML Document Converter
  • Clipboard Converter
  • Data Conversion Macro
  • + some other tools

  • CC Table
  • TECkit map
  • AdaptIt Knowledge Base
  • ICU Transliterator
  • Perl Expression
  • Python Script
  • RegEx (find & replace)
  • Online converters
    • Bing Translator
    • DeepL Translator
    • Google Translate
  • …and a few more

11

12 of 65

Example #1 - rather complex, but good fun!

12

13 of 65

Example #2 - spellings

“I need a tool that switches between British English spelling conventions and American English spellings.”

AND��A single tool that works in multiple places with minimum effort.”

13

Paratext, FLEx, Word, Excel, e-mails.

14 of 65

Example #2 - spellings (now available in Paratext)

One of the Project Types in Paratext is:��Transliteration (using Encoding Converter)

It takes data from one project, converts it, and places the converted data in another project!

GNTUK → British2American → GNTUS

There’s a lot more to it than that, but this is �just to illustrate HOW converters get used in Paratext.

14

15 of 65

15

16 of 65

Or… Tidy up a Back Translation

16

17 of 65

17

Or use a Converter to Transliterate scripts that you otherwise would not be able to read.��(It is a common need in South Asia to publish the same scripture text in more than one script.)

18 of 65

18

19 of 65

Setting up a Converter

19

20 of 65

Adding a Converter to the Repository

SIL Converters has some standard converters, but allows you to add custom converters to its repository.

When you click on� Add Converter�it needs to know what kind of converter (Transduction Engine) you’re going to add.

20

21 of 65

Adding a Converter to the Repository

21

22 of 65

Adding a Converter to the Repository

22

?

23 of 65

Overview of the Converters

23

the 16different�types of

24 of 65

SIL-specific Converters

CC Table

  • bulk find-&-replace tool (‘a’ > ‘b’)

TECkit map

  • for more complex rule-based transformations, �and TECkit maps are often also reversible

AdaptIt Knowledge Base Converter

  • leverage work that was done in AdaptIt for adapting whole words or phrases between two languages or two scripts in an AdaptIt project

24

25 of 65

AdaptIt Knowledge Base Converter with OSE

OneStory Editor’s glossing tool being used to disambiguate where a one-to-many relationship needs human interaction

25

26 of 65

Programming Languages

Perl Expression

  • one or more expressions, usually substitutions

Python Script

  • with this converter, you can make SIL Converters do almost anything you want (using common libs)

Regular Expression Find and Replace (ICU)

  • extremely powerful, & quick to configure if you already have a good handle on RegEx (patterns & behavior are based on Perl’s regular expressions)

26

27 of 65

Perl Expression to strip Paratext markup

27

$strInOut =~ s/^\\c \d+//g;

$strInOut =~ s/^\\(pc|pi1|pi) ?//g;

$strInOut =~ s/^\\[psm] ?//g;

$strInOut =~ s/\\v \d+-?\d{0,3} //g;

$strInOut =~ s/\\[pm] //g;

$strInOut =~ s/\\q\d //g;

$strInOut =~ s/ *\\rq .*?\\rq//g;

$strInOut =~ s/\\qt (.*?)\\qt*/$1/g;

$strInOut =~ s/\\fig .*?\\fig//g;

$strInOut =~ s/\\[fx] .*?\\[fx]\*//g;

$strInOut =~ s/\\v[ap] .*?\\v[ap]\* *//g;

$strInOut =~ s/[\x{2020}\x{2021}\x{200c}\x{200d}\x{230a}\x{230b}]//g;

$strInOut =~ s/\*//g;

28 of 65

ICU-based Converters

ICU Transliterator

  • great for doing off-the-shelf transliteration with �a very impressive list of mostly complex scripts

ICU Converter

  • not much use any more (c.f. Code Page Converter)

ICU Boundary Analysis/Break Iterator

  • for languages (like Thai and Chinese) that don't otherwise use spaces between words; can be used to split them based on ICU dictionaries

28

29 of 65

29

30 of 65

Combining Converters

Primary-Fallback Converter

  • try this (primary) converter first, but if nothing changes, then use this other (fallback) converter

Compound (daisy-chained) Converter

  • combines 2 or more existing converters to make a new converter which does all the steps at once
  • extremely useful, and worth learning about

Code Page Converter

  • not much use any more

30

31 of 65

Primary-Fallback Converter calling an

31

AdaptIt Knowledge Base Converter

32 of 65

Web-based Converters Translators

Bing Translator

  • multilingual translations with broad language support and continuous improvements (Microsoft)

DeepL Translator

  • provides highly accurate and natural-sounding translations using advanced neural networks

Google Translate

  • automated translation for numerous languages with varying degrees of accuracy and naturalness

32

33 of 65

Paratext Plugin: Back Translation Helper

33

Bing

DeepL

Paratext

Source BT

34 of 65

Subtle differences of meaning!

34

35 of 65

The principle of 2 or 3 witnesses…

35

36 of 65

Balancing accuracy with naturalness

36

37 of 65

Demo of various tools

37

38 of 65

Live demonstration

  • Installation Tips (enabling the right features)
  • Using Translation Helper dialog in Word
  • Doing RegEx Find and Replace operations in Word
  • Using SIL Converters in Excel (to obfuscate email addresses) with:
    • selected cells
    • ConvertString function
  • XML converter calling AI translators to localize Transcelerator �questions into an LWC other than English (major regional languages)

38

39 of 65

39

40 of 65

Using a converter with OneStory Editor

40

41 of 65

Using a converter with OneStory Editor

41

42 of 65

Using a converter with OneStory Editor

42

43 of 65

Installation Tips�Enabling the right features

43

44 of 65

Overview of some other tools

  • Clipboard EncConverter
  • FLEx
  • SFM
  • AdaptIt
  • Bulk Word Document Converter
  • Technical Hindi Google Group converter

44

45 of 65

Clipboard EncConverter

Copy source text → Right-click on to Convert � → Select the appropriate Convertor → Paste converted text

45

46 of 65

Install (or Uninstall) Converters as Needed

46

47 of 65

Accessing SIL Converters in FLEx

When in Bulk Edit mode, �you can an call �a “Process” (i.e. an SIL Converter)

47

48 of 65

Accessing SIL Converters in FLEx

48

49 of 65

Accessing SIL Converters in FLEx

49

50 of 65

Converting SFM databases

50

51 of 65

Converting SFM databases

51

52 of 65

XML Document Converter

Easy to use with click and select options.

But also supports XPath Expressions for advanced users.

52

53 of 65

XML Document Converter (sample data)

53

54 of 65

Bulk Word Document Converter

Select several Docs

(let it scan them…)

Select the Converter(s) for each font/row

Optionally set the �new Font to apply

Click on the Save icon

54

55 of 65

Linguistic Tools in LibreOffice: software.sil.org/oolt/

55

56 of 65

SIL Converters in Word: Multiple tools

56

57 of 65

SIL Converters in Word: RegEx Find/Replace

57

58 of 65

Using a converter with AdaptIt

58

59 of 65

What else is included?

  • TECkit map editor (helps you build and test your mapping)
  • Discourse Chart Helper (another very useful utility for analysis)
  • Consistent Spelling Checker (now redundant as Paratext does a better job)

59

60 of 65

TECkit map editor

60

61 of 65

Discourse Chart Helper

Helps prepare discourse analysis charts to study discourse features of a vernacular language text

61

62 of 65

62

63 of 65

Finding Help and Getting Support

The built-in documentation is very comprehensive �(please refer to it before asking for support)

Also check out the FAQs:�software.sil.org/silconverters/silconverters-faq/

If all else fails, e-mail: silconverters_support@sil.orgfor limited support.

63

64 of 65

64

Hunger is the best cook

Hunger is the best cook

65 of 65

SIL Converters

There’s always more …

65

Thanks for attending!