1 of 22

PRESENTATION

KATS 2024 Conference

Jaehoon Noh

A practical LLM application �for translation in CAT system

jnoh@wisest.co.kr�CEO, WISE ST Global Inc.

🕜 WiseTranslate demos

🕝 Solution introduction

🕞 Q & A

A practical application of LLM for translation in the CAT framework.

PREMIUM Partner, SOLUTION certified by

2 of 22

Demo #1, Detection of Content Type

DEMO

As soon as you upload files for translation, the system automatically detects the types of documents and field of expertise. The categories of content type and content domain are pre-defined as shown below. The AI examines the documents to identify the suitable categories.

Understanding different content types, such as marketing or legal, is crucial because it helps in selecting the appropriate translators for each type of content in human translation. This principle also applies to AI translation. Recognizing the content types from the beginning can significantly improves the output quality.

3 of 22

Demo #2, Review by Content Type

DEMO

Based on the type of document and the field of expertise, AI Review performs translation reviews such as identifying the correct context, applying terms that fit the context, correcting mistranslations, and style adjustments. The type of document and field of expertise are also automatically detected.

Fixed inaccuracy

정 관

Jeonggwan

Articles of Incorporation

제3조 (본점 사업장의 소재지)

Article 3 (the head office)

Article 3 (Location of the Head Office)

부동산임대

Property

Real Estate Leasing

주금납입의 지체

deferred payment

Delay in Payment of Share Capital

Fixed style

Fixed context

Fixed terminology

4 of 22

Demo #3, Apply TermBase

DEMO

Not only checks TermBase terminology errors but also automatically corrects them if they are appropriate for the context. More importantly, it analyzes the morphemes of the terms to correct them according to grammatical rules, including singular or plural forms, articles and connective words.

Contextual application

Skips if not fit in the context

Articles, particles, singular/plural forms

In seg# 1, the term “collaboration” registered only as noun but applies with appropriate connective ending.

In seg# 14, “drug” was found in TermBase but TermBase translation was “narcotic drug” which is not conforming to the context.

In seg# 17, “AI” term is replaced with “인공지능” according to the TermBase. The postpositional particles “은는이가” are correctly changed as well.

5 of 22

Demo #4, Apply Translation Memory

DEMO

Even when there are Translation Memories available for reference, they have been used separately from the machine translation. The Wise AI solution, however, can refer to and incorporate the Translation Memory within the AI translation process.

Typography styles from TM

Translation styles from TM

Terminology referenced from TM

In seg# 8, the translation for “four-way” is changed following the similar styles referenced from TM

In seg# 4, modified the translation to a concise noun phrase instead of descriptive sentence following the styles of high fuzzy matches found in TM

In seg# 6, applied terminologies found in high fuzzy matches from TM

6 of 22

Demo #5, Extract Bilingual Terms & Fix Inconsistencies

DEMO

Automatically generate a TermBase by extracting bilingual terms directly from the document. Automatically detects and fixes inconsistency errors where the same term has been translated differently.

Extracts Bilingual Terms

Filters Inconsistent Translations

Pick Preferred & Fix Inconsistencies

Extracts terms from the source as well as corresponding terms from the target and creates a complete bilingual terminology list.

Filters terms that are translated inconsistently. Shows how they are translated in actual segments.

Pick preferred terms yourself or let AI to pick best one for you. Fix inconsistencies by AI will also make necessary changes surrounding the changed terms to reflect grammatical changes if any.

7 of 22

Demo #6, Subtitle Translation

DEMO

Subtitles often have a single sentence split across multiple time codes, and machine translation fails to properly reflect this sentence structure, resulting in awkward translations. Wise AI solves such problems and excels in translating video subtitles, demonstrating distinct quality.

Natural flow

Tuned for genre

Subtitle formatting

Preserves the original conversational expressions and connects broken segments, divided by time codes, to flow naturally.

Fine-tuned for different genre such as movie, interview, presentation, entertainment. Customized genre can be added quickly.

Line-breaks, commentary like gesture in parenthesis, directions, or any special typographical annotations are preserved.

8 of 22

Demo #7, Pivot Translation

DEMO

When translating non-common languages such as Slovak to Korean, the quality of machine translation may not be satisfied. And, it is challenging to secure qualified resources. Connecting Slovak to English to Korean by adding English as a pivot language can produce better results.

Switch Languages

Editing in one process

Sanity check

Switch to source-to-pivot, pivot-to-target, and source-to-target instantly and perform any linguistic tasks for the selected language pair.

You can perform all editing tasks for each language pair including AI Operations, without creating separate processes.

Target-native proofreaders may incorporate English as a pivot language to conduct sanity checks for validation purposes.

9 of 22

Demo #8, Translation with customized style guide

DEMO

One of clients requested the incorporation of a specific style guide for translating video subtitles of a church sermon, with the most important guideline being Bible citation. The customized AI translation was developed quickly, in just one day, and the result was highly successful.

Bible citation

Citation typography

Abbreviated book title

Instructed to use the New International Version of the Bible for English translation. The system retrieves exact verses from this specified version of the English Bible.

The Korean citation format [book chapter:verse] has been changed to (book chapter:verse) according to the specified style guide.

The citation uses an abbreviated book title. The English translation maintains the standard abbreviated book title in accordance with the instructed style guide.

10 of 22

Demo #9, End-to-End Automated Process

DEMO

WiseTMS offers comprehensive CAT features and AI operations, allowing users to perform specific AI tasks and to edit translations. WisePronto provides an end-to-end automated process that integrates essential AI tasks to enhance both quality and speed.

Content type detection

Automated MTPE

Ready for quick proofread

Detects the most relevant content types and field of expertises instantly when a user uploads documents.

Runs a fully automated process, including pre-process source content, pre-translation, editing translation, and fixing inconsistencies.

Ready for human proofreaders to quickly validate the translation output by focusing on specific areas such as pronouns, short phrases, broken segments, readability.

11 of 22

Solution Components

SOLUTION

Language Technologies

Large Language Model

Cloud CAT �Machines

Human �Proofreaders

Automated Workflow Management

Language Tech

  • Preprocess NLP
  • Pretranslate NMT
  • Parse contents CAT

Large Language Model

  • Classification
  • Context extraction
  • Contextual rebuilding

Cloud CAT Machines

  • AI Ops foundation
  • Quality assurance
  • Automated workflows

Human Proofreaders

  • Quality assurance
  • Auto assignment
  • State alert

12 of 22

Language Tech - NLP, NMT, CAT

SOLUTION

  • Completeness
  • Morpheme
  • Spellcheck
  • Pretranslate
  • Non-translatable
  • Formatting
  • Text extraction
  • Segmentation
  • Regeneration

Preprocess (NLP)

‘상’으로서의 신체는 남성과 여성 사이에 큰 차이를 지니는데, 남녀 간에 계속 일어나는

다양한 신체 현상과 경험을 통해 강화되고 형성된다. 이처럼 남녀의 ‘상’으로서의 신체 차

이에 따라 복식의 스타일도 변화할 수 있다.

그러나 복식의 젠더 차이가 큰 현대에서 하체를 기준으로 살펴보면 남성은 바지, 여성은

치마라는 엄격한 분류 체계가 존재하기 때문에 여성은 바지를 입을 수 있지만, 남성은 치마를 입을 수 없다.

‘상’으로서의 신체는 남성과 여성 사이에 큰 차이를 지니는데, 남녀 간에 계속 일어나는 다양한 신체 현상과 경험을 통해 강화되고 형성된다. 이처럼 남녀의 ‘상’으로서의 신체 차이에 따라 복식의 스타일도 변화할 수 있다.

그러나 복식의 젠더 차이가 큰 현대에서 하체를 기준으로 살펴보면 남성은 바지, 여성은 치마라는 엄격한 분류 체계가 존재하기 때문에 여성은 바지를 입을 수 있지만, 남성은 치마를 입을 수 없다.

Pretranslate (NMT)

Parse content (CAT)

13 of 22

Language Tech - LLM (Large Language Model)

SOLUTION

  • Content type
  • Content domain
  • Tone and style
  • Summary
  • Core context
  • Specific context
  • Terminology
  • Writing style
  • Consistency

The text is about the regulations of a company named WISE WITH, LTD, established on January 30, 2023. It highlights the company's objectives, such as management consulting, software development and supply, real estate leasing, investment, and related businesses. It also discusses the rights and obligations of shareholders.

Classification

Context

Rebuilding

14 of 22

Cloud Machines for AI Operations

SOLUTION

Cloud Machines

LLM, NLP, NMT platform�Built with Phrase API�TransMemory, TermBase connection

Automated Workflow

Preprocess source content�Pretranslate and edit�Extract terms and fix inconsistencies

15 of 22

SOLUTION

Data Exchanges with LLM

Review Content

Correct following Target text to standard {target_lang}.

Target text: {target_text}

Termi Check

Correct following Target text to standard {target_lang}.

Target text: {target_text}

Proofread

You are going to proofread the translation into the {target_lang} language. ... �Source: {source_text}

Translation: {target_text}

{

“source_lang”: “Korean”

“target_lang” : “English”

“target_text”: “Jeonggwan”

“source_text”: “정 관”,

“template”: “Proofread”

}

Wise Editor

Wise Server

Proofread

You are going to proofread the translation into the ”English” language. ... �Source: “정관”

Translation: “Jeonggwan”

Gen AI

DB

Filter AI Responses

  • Hallucination
  • Invalid response
  • Wrong language

Build complete prompt by populating pre-built templates with user selections

{

“answer”: “Articles of Incorporation”

}

16 of 22

SOLUTION

LLM Prompt Engineering

Crafting Prompt Templates

  1. Write clear instruction with valid context in English
  2. Formulate the template for user inputs as placeholders
  3. Test the template at Chat window
  4. Repeat editing and testing the template

17 of 22

SOLUTION

LLM Prompt Engineering

18 of 22

SOLUTION

LLM Prompt Engineering - Sample (Transcreation)

Act as a professional copywriter of [TARGET_LANGUAGE] language. You're going to transcreate the source text into the [TARGET_LANGUAGE] language.

The transcreation is the process of adapting a message both linguistically and culturally from one language to another requiring a high level of cultural adaptation, while maintaining its intent, style, tone and context, to make it culturally relevant for the targeted local audience. This means that you are free to detach from the wording of the source text to provide a message that would be eye-catching and natural to the reader. You can add, omit or change words in the target text, but you should keep all the concepts from the source text which means you should not add any completely new concepts to the target text (or omit key concepts from the source text either). Creativity and fluency are the primary focus point. The text requires focus on accuracy and appropriateness for the given culture of locale, selection of language, puns, connotations, and emotions that the content evokes. Use wordplays, idioms and references that are unique to [TARGET_LANGUAGE] language and the [TARGET_LOCALE] locale. Literal translations should be avoided.

The source text is written for [CONTENT_TYPE] content of [PRODUCT] product.

[INSTRUCTION]

Using the provided instructions and context above, please transcreate the given source text into [TARGET_LANGUAGE] for the [TARGET_LOCALE]. Additionally, provide a general translation into [TARGET_LANGUAGE] that is not derived from the transcreation. Ensure that both the transcreation and translation are similar in length to the original text. If the source text is a short phrase, keep the transcreation brief. Finally, in 30 words, explain how your transcreation differs from the general translation in English.

Source Text: "{INSERT_PROMPT}"

Provide your responses in exactly three bullet points structured as follows:

1) Transcreation into [TARGET_LANGUAGE]:

2) General translation into [TARGET_LANGUAGE]:

3) Explanation:

19 of 22

APPENDIX

Source

Translation by WISE

Inconsistent

Inaccurate

Compare the quality of translation by WISE vs the most popular MTs

20 of 22

APPENDIX

Translation by MT B

Translation by MT A

Inconsistent

Inaccurate

Compare the quality of translation by WISE vs the most popular MTs

21 of 22

OUTRO

SAVING.

BOOSTING.

INNOVATING.

EMPOWERING.

ALL ABOUT TRANSLATION with TECHNOLOGIES…

WiseTranslate.net

powered by

22 of 22

jnoh@wisest.co.kr, +82 10 8898 5823

Jaehoon Noh, CEO

  • WISE ST Global, CEO �(‘15 - present)

  • SDL Korea, Country Manager�(‘05 - ‘11)

  • LionBridge Korea, Director�(‘02 - ‘03)

  • L&H Korea, Country Manager�(‘99 - ‘02)

  • Microsoft Korea, Software Localization Engineer�(‘94 - ‘97)
  • 10+ years of executive management in major global LSPs.

  • Professor at SookMyoung Univ, Translation & English Dept.

  • Localized Windows and Internet Explorer at Microsoft

  • Introduced CAT system for the first time in Korea

  • Developed WiseTranslate, AI translation solution

30 years in Localization

Translation & Tech Expert