PRESENTATION
KATS 2024 Conference
Jaehoon Noh
A practical LLM application �for translation in CAT system
jnoh@wisest.co.kr�CEO, WISE ST Global Inc.
🕜 WiseTranslate demos
🕝 Solution introduction
🕞 Q & A
A practical application of LLM for translation in the CAT framework.
PREMIUM Partner, SOLUTION certified by
Demo #1, Detection of Content Type
DEMO
As soon as you upload files for translation, the system automatically detects the types of documents and field of expertise. The categories of content type and content domain are pre-defined as shown below. The AI examines the documents to identify the suitable categories.
Understanding different content types, such as marketing or legal, is crucial because it helps in selecting the appropriate translators for each type of content in human translation. This principle also applies to AI translation. Recognizing the content types from the beginning can significantly improves the output quality.
Demo #2, Review by Content Type
DEMO
Based on the type of document and the field of expertise, AI Review performs translation reviews such as identifying the correct context, applying terms that fit the context, correcting mistranslations, and style adjustments. The type of document and field of expertise are also automatically detected.
Fixed inaccuracy
정 관 | Jeonggwan | Articles of Incorporation |
제3조 (본점 사업장의 소재지) | Article 3 (the head office) | Article 3 (Location of the Head Office) |
부동산임대 | Property | Real Estate Leasing |
주금납입의 지체 | deferred payment | Delay in Payment of Share Capital |
Fixed style
Fixed context
Fixed terminology
Demo #3, Apply TermBase
DEMO
Not only checks TermBase terminology errors but also automatically corrects them if they are appropriate for the context. More importantly, it analyzes the morphemes of the terms to correct them according to grammatical rules, including singular or plural forms, articles and connective words.
Contextual application
Skips if not fit in the context
Articles, particles, singular/plural forms
In seg# 1, the term “collaboration” registered only as noun but applies with appropriate connective ending.
In seg# 14, “drug” was found in TermBase but TermBase translation was “narcotic drug” which is not conforming to the context.
In seg# 17, “AI” term is replaced with “인공지능” according to the TermBase. The postpositional particles “은는이가” are correctly changed as well.
Demo #4, Apply Translation Memory
DEMO
Even when there are Translation Memories available for reference, they have been used separately from the machine translation. The Wise AI solution, however, can refer to and incorporate the Translation Memory within the AI translation process.
Typography styles from TM
Translation styles from TM
Terminology referenced from TM
In seg# 8, the translation for “four-way” is changed following the similar styles referenced from TM
In seg# 4, modified the translation to a concise noun phrase instead of descriptive sentence following the styles of high fuzzy matches found in TM
In seg# 6, applied terminologies found in high fuzzy matches from TM
Demo #5, Extract Bilingual Terms & Fix Inconsistencies
DEMO
Automatically generate a TermBase by extracting bilingual terms directly from the document. Automatically detects and fixes inconsistency errors where the same term has been translated differently.
Extracts Bilingual Terms
Filters Inconsistent Translations
Pick Preferred & Fix Inconsistencies
Extracts terms from the source as well as corresponding terms from the target and creates a complete bilingual terminology list.
Filters terms that are translated inconsistently. Shows how they are translated in actual segments.
Pick preferred terms yourself or let AI to pick best one for you. Fix inconsistencies by AI will also make necessary changes surrounding the changed terms to reflect grammatical changes if any.
Demo #6, Subtitle Translation
DEMO
Subtitles often have a single sentence split across multiple time codes, and machine translation fails to properly reflect this sentence structure, resulting in awkward translations. Wise AI solves such problems and excels in translating video subtitles, demonstrating distinct quality.
Natural flow
Tuned for genre
Subtitle formatting
Preserves the original conversational expressions and connects broken segments, divided by time codes, to flow naturally.
Fine-tuned for different genre such as movie, interview, presentation, entertainment. Customized genre can be added quickly.
Line-breaks, commentary like gesture in parenthesis, directions, or any special typographical annotations are preserved.
Demo #7, Pivot Translation
DEMO
When translating non-common languages such as Slovak to Korean, the quality of machine translation may not be satisfied. And, it is challenging to secure qualified resources. Connecting Slovak to English to Korean by adding English as a pivot language can produce better results.
Switch Languages
Editing in one process
Sanity check
Switch to source-to-pivot, pivot-to-target, and source-to-target instantly and perform any linguistic tasks for the selected language pair.
You can perform all editing tasks for each language pair including AI Operations, without creating separate processes.
Target-native proofreaders may incorporate English as a pivot language to conduct sanity checks for validation purposes.
Demo #8, Translation with customized style guide
DEMO
One of clients requested the incorporation of a specific style guide for translating video subtitles of a church sermon, with the most important guideline being Bible citation. The customized AI translation was developed quickly, in just one day, and the result was highly successful.
Bible citation
Citation typography
Abbreviated book title
Instructed to use the New International Version of the Bible for English translation. The system retrieves exact verses from this specified version of the English Bible.
The Korean citation format [book chapter:verse] has been changed to (book chapter:verse) according to the specified style guide.
The citation uses an abbreviated book title. The English translation maintains the standard abbreviated book title in accordance with the instructed style guide.
Demo #9, End-to-End Automated Process
DEMO
WiseTMS offers comprehensive CAT features and AI operations, allowing users to perform specific AI tasks and to edit translations. WisePronto provides an end-to-end automated process that integrates essential AI tasks to enhance both quality and speed.
Content type detection
Automated MTPE
Ready for quick proofread
Detects the most relevant content types and field of expertises instantly when a user uploads documents.
Runs a fully automated process, including pre-process source content, pre-translation, editing translation, and fixing inconsistencies.
Ready for human proofreaders to quickly validate the translation output by focusing on specific areas such as pronouns, short phrases, broken segments, readability.
Solution Components
SOLUTION
Language Technologies
Large Language Model
Cloud CAT �Machines
Human �Proofreaders
Automated Workflow Management
Language Tech
Large Language Model
Cloud CAT Machines
Human Proofreaders
Language Tech - NLP, NMT, CAT
SOLUTION
Preprocess (NLP)
‘상’으로서의 신체는 남성과 여성 사이에 큰 차이를 지니는데, 남녀 간에 계속 일어나는
다양한 신체 현상과 경험을 통해 강화되고 형성된다. 이처럼 남녀의 ‘상’으로서의 신체 차
이에 따라 복식의 스타일도 변화할 수 있다.
그러나 복식의 젠더 차이가 큰 현대에서 하체를 기준으로 살펴보면 남성은 바지, 여성은
치마라는 엄격한 분류 체계가 존재하기 때문에 여성은 바지를 입을 수 있지만, 남성은 치마를 입을 수 없다.
‘상’으로서의 신체는 남성과 여성 사이에 큰 차이를 지니는데, 남녀 간에 계속 일어나는 다양한 신체 현상과 경험을 통해 강화되고 형성된다. 이처럼 남녀의 ‘상’으로서의 신체 차이에 따라 복식의 스타일도 변화할 수 있다.
그러나 복식의 젠더 차이가 큰 현대에서 하체를 기준으로 살펴보면 남성은 바지, 여성은 치마라는 엄격한 분류 체계가 존재하기 때문에 여성은 바지를 입을 수 있지만, 남성은 치마를 입을 수 없다.
Pretranslate (NMT)
Parse content (CAT)
Language Tech - LLM (Large Language Model)
SOLUTION
The text is about the regulations of a company named WISE WITH, LTD, established on January 30, 2023. It highlights the company's objectives, such as management consulting, software development and supply, real estate leasing, investment, and related businesses. It also discusses the rights and obligations of shareholders.
Classification
Context
Rebuilding
Cloud Machines for AI Operations
SOLUTION
Cloud Machines
LLM, NLP, NMT platform�Built with Phrase API�TransMemory, TermBase connection
Automated Workflow
Preprocess source content�Pretranslate and edit�Extract terms and fix inconsistencies
SOLUTION
Data Exchanges with LLM
Review Content
Correct following Target text to standard {target_lang}.
Target text: {target_text}
Termi Check
Correct following Target text to standard {target_lang}.
Target text: {target_text}
Proofread
You are going to proofread the translation into the {target_lang} language. ... �Source: {source_text}
Translation: {target_text}
{
“source_lang”: “Korean”
“target_lang” : “English”
“target_text”: “Jeonggwan”
“source_text”: “정 관”,
“template”: “Proofread”
}
Wise Editor
Wise Server
Proofread
You are going to proofread the translation into the ”English” language. ... �Source: “정관”
Translation: “Jeonggwan”
Gen AI
DB
Filter AI Responses
Build complete prompt by populating pre-built templates with user selections
{
“answer”: “Articles of Incorporation”
}
SOLUTION
LLM Prompt Engineering
Crafting Prompt Templates
SOLUTION
LLM Prompt Engineering
SOLUTION
LLM Prompt Engineering - Sample (Transcreation)
Act as a professional copywriter of [TARGET_LANGUAGE] language. You're going to transcreate the source text into the [TARGET_LANGUAGE] language.
The transcreation is the process of adapting a message both linguistically and culturally from one language to another requiring a high level of cultural adaptation, while maintaining its intent, style, tone and context, to make it culturally relevant for the targeted local audience. This means that you are free to detach from the wording of the source text to provide a message that would be eye-catching and natural to the reader. You can add, omit or change words in the target text, but you should keep all the concepts from the source text which means you should not add any completely new concepts to the target text (or omit key concepts from the source text either). Creativity and fluency are the primary focus point. The text requires focus on accuracy and appropriateness for the given culture of locale, selection of language, puns, connotations, and emotions that the content evokes. Use wordplays, idioms and references that are unique to [TARGET_LANGUAGE] language and the [TARGET_LOCALE] locale. Literal translations should be avoided.
The source text is written for [CONTENT_TYPE] content of [PRODUCT] product.
[INSTRUCTION]
Using the provided instructions and context above, please transcreate the given source text into [TARGET_LANGUAGE] for the [TARGET_LOCALE]. Additionally, provide a general translation into [TARGET_LANGUAGE] that is not derived from the transcreation. Ensure that both the transcreation and translation are similar in length to the original text. If the source text is a short phrase, keep the transcreation brief. Finally, in 30 words, explain how your transcreation differs from the general translation in English.
Source Text: "{INSERT_PROMPT}"
Provide your responses in exactly three bullet points structured as follows:
1) Transcreation into [TARGET_LANGUAGE]:
2) General translation into [TARGET_LANGUAGE]:
3) Explanation:
APPENDIX
Source
Translation by WISE
Inconsistent
Inaccurate
Compare the quality of translation by WISE vs the most popular MTs
APPENDIX
Translation by MT B
Translation by MT A
Inconsistent
Inaccurate
Compare the quality of translation by WISE vs the most popular MTs
OUTRO
SAVING.
BOOSTING.
INNOVATING.
EMPOWERING.
ALL ABOUT TRANSLATION with TECHNOLOGIES…
WiseTranslate.net
powered by
jnoh@wisest.co.kr, +82 10 8898 5823
Jaehoon Noh, CEO
30 years in Localization
Translation & Tech Expert