1 of 42

2 of 42

Contents

  1. Machine translation
  2. Speech recognition

3 of 42

What is a machine translation?

  1. Machine Translation or MT or robotized interpretation is simply a procedure when a computer software translates text from one language to another without human contribution.

  • At its fundamental level, Machine Translation performs a straightforward replacement of atomic words of one language for words in another. 

  • In simple language, we can say that machine translation works by using computer software to translate the text from one source language to another target language. 

4 of 42

What Is A Machine Translation?

Machine Translation is the process of converting the text in a source language to a required target language.

5 of 42

What Is A Machine Translation?

Given a sequence of text in a source language, there is no one single best translation of that text to another language.

This is because of the natural ambiguity and flexibility of human language, especially with most of the Indian Languages that are rarely spoken in other countries.

6 of 42

What Is A Machine Translation?

Traditionally, Natural Language Processing of both spoken and written language has been regarded as consisting of the following stages:

  1. Phonology and Phonetics: (Processing of sound)
  2. Morphology: (Processing of word forms)
  3. Lexicon: (Storage of words and associated knowledge)
  4. Parsing: (Processing of structure)
  5. Semantics: (Processing of meaning)
  6. Pragmatics: (Processing of user intention, modeling, etc.)
  7. Discourse: (Processing of connected text)

7 of 42

What Is A Machine Translation?

Here is the pictorial representation of Bernard Vauquois' pyramid, showing comparative depths of intermediary representation, interlingual machine translation at the peak, followed by transfer-based, then direct translation.

8 of 42

Types Of Machine Translation

There are different types of machine translation…..

        • Statistical Machine Translation or SMT
        • Rule-based Machine Translation or RBMT
        • Hybrid Machine Translation or HMT
        • Neural Machine Translation or NMT

9 of 42

Statistical Machine Translation or SMT

  1. Statistical Machine Translation (SMT) is a machine translation paradigm where translations are made on the basis of statistical models, the parameters of which are derived on the basis of the analysis on large volumes of bilingual text corpus.

  • The term bilingual text corpus refers to the collection of a large and structured set of texts written in two different languages.

  • It expects to decide the correspondence between a word from the source language and a word from the objective language

  • A genuine illustration of this is Google Translate. 

10 of 42

Statistical Machine Translation or SMT

  1. SMT is mostly based on Information Theory which studies the quantification, storage, and communication of information.

  • The statistical models consist of statistical information such as co-relation between the SL and TL and well formed sentences. During the translation statistical models helped to find the best translation of the source text.

11 of 42

Statistical Machine Translation or SMT

The most abstract view of Statistical Machine Translation can be understood from the image below:

12 of 42

Statistical Machine Translation or SMT

  1. Its most noteworthy disadvantage is that it doesn't factor in context, which implies translation can regularly be wrong or you can say, don't expect great quality translation.

  • There are several types of statistical-based machine translation models which are:  Word-based translation, Phrase-based translation, Syntax-based translation, Hierarchical phrase-based translation.  

13 of 42

Rule-based Machine Translation or RBMT

  1. RBMT basically translates the basics of grammatical rules.
  2. Rule-Based Machine Translation (RBMT) relies on innumerous built-in linguistic rules and millions of bilingual dictionaries for each language pair
  3. It directs A Grammatical Examination of the source language and the objective language to create the translated sentence.
  4. In RBMT, translations are built on various sophisticated rules, but it does provide the users with the freedom to make use of their own terminology by adding them to the translation process. This is done by creating a new user-defined dictionary (consisting of user defined terminology) that overrides the system’s default settings.
  5. But, RBMT requires broad editing, and its substantial reliance on dictionaries implies that proficiency is accomplished after a significant period.

14 of 42

Rule-based Machine Translation or RBMT

15 of 42

Hybrid Machine Translation or HMT

  1. As the name suggests, HMT is a method of machine translation that incorporates the use of multiple different machine translation approaches within a single machine translation system.

  • The underlying motivation behind the use of HMT is the fact that a failure of a single machine translation technique should not stop the system from achieving the required level of accuracy.

16 of 42

Hybrid Machine Translation or HMT

  1. HMT, as the term demonstrates, is a mix of RBMT and SMT.
  2. It uses a Translation Memory, making it unquestionably more successful regarding quality.
  3. Nevertheless, even HMT has a lot of downsides, the biggest of which is the requirement for enormous editing, and human translators will also be needed.
  4. There are several approaches to HMT like multi-engine, statistical rule generation, multi-pass, and confidence-based.

17 of 42

Hybrid Machine Translation or HMT

  • Multi-engine: In this approach the focus is on achieving parallelism by running multiple MT systems in parallel and getting the output by combining the outputs of various sub-systems.

  • Statistical rule generation: In this approach, statistical data is used to generate various lexical and syntactic rules. It is also a very time-savvy as it extracts some of the rules directly from the training data.

18 of 42

Hybrid Machine Translation or HMT

  • Multi-Pass: In this approach, the input is processed multiple times in sequential order. One of the most common techniques that uses this approach is Pre-processing of data.

  • Confidence-Based: this approach is different from all the above ones as in this, a confidence matrix is produced for each of the translated sentence from which a decision can be made whether a secondary translation technology is required or the output from the first one is sufficient, unlike in others, where only a single translation technology is used.

19 of 42

Neural Machine Translation or NMT

  • In Neural Machine Translation (NMT), we make use of a neural network model to learn a statistical model for machine translation.

  • One of the key advantages of NMT over SMT is that in NMT, a single system can be trained directly on source as well as target text thereby removing the dependency on a pipeline of specialized systems as that in SMT.

20 of 42

What Are The Benefits Of Machine Translation?

  1. Rapid translation: One of the crucial benefits of Machine Translation is speed as you have noticed that computer programs can translate a huge amount of text rapidly. Yes, the human translator does their work more accurately but they cannot match the speed of the computer.

  • Less expensive: If you especially train the machine to your requirements, machine translation gives the ideal blend of brisk and cost-effective translations as it is less expensive than using a human translator.

  • High learning capability: Another benefit of machine translation is its capability to learn important words and reuse them wherever they might fit. 

21 of 42

Applications of machine translation

    • Text translation
    • Speech translation

22 of 42

Machine Translation vs Human translation

  1. Machine translation hits that sweet spot of cost and speed, offering a truly snappy path for brands to translate their records at scale without much overhead. Yet, that doesn't mean it's consistently relevant.

On the other hand, human translation is incredible for those undertakings that require additional consideration and subtlety. Talented translators work on your image's substance to catch the first importance and pass on that feeling or message basically in another assortment of work.

23 of 42

Machine Translation vs Human translation

  1. Leaning upon how much content should be translated, the machine translation can give translated content very quickly, though human translators will take additional time. Time spent finding, verifying, and dealing with a group of translators should likewise be considered.

24 of 42

Machine Translation vs Human translation

  1. Numerous translation programming providers can give machine translations at practically zero cost, making it a reasonable answer for organizations who will be unable to manage the cost of expert translations.

25 of 42

Machine Translation vs Human translation

  1. Machine Translation is the instant modification of text from one language to another utilizing artificial intelligence whereas a human translation, includes actual brainpower, in the form of one or more translators translating the text manually. 

26 of 42

Speech recognition

27 of 42

Building a Speech Recognizer

  1. Speech Recognition or Automatic Speech Recognition (ASR) is the center of attention for AI projects like robotics.

  • Without ASR, it is not possible to imagine a cognitive robot interacting with a human. However, it is not quite easy to build a speech recognizer.

28 of 42

Difficulties in developing a speech recognition system

  1. Speech Recognition or Automatic Speech Recognition (ASR) is the center of attention for AI projects like robotics.

  • Without ASR, it is not possible to imagine a cognitive robot interacting with a human. However, it is not quite easy to build a speech recognizer.

29 of 42

Difficulties in developing a speech recognition system

Developing a high quality speech recognition system is really a difficult problem.

The difficulty of speech recognition technology can be broadly characterized along a number of dimensions as discussed below −

30 of 42

Difficulties in developing a speech recognition system

Size of the vocabulary 

− Size of the vocabulary impacts the ease of developing an ASR.

Consider the following sizes of vocabulary for a better understanding.

    • A small size vocabulary consists of 2-100 words, for example, as in a voice-menu system
    • A medium size vocabulary consists of several 100s to 1,000s of words, for example, as in a database-retrieval task
    • A large size vocabulary consists of several 10,000s of words, as in a general dictation task.

Note that, the larger the size of vocabulary, the harder it is to perform recognition.

31 of 42

Difficulties in developing a speech recognition system

Channel characteristics −

Channel quality is also an important dimension.

For example, human speech contains high bandwidth with full frequency range, while a telephone speech consists of low bandwidth with limited frequency range. Note that it is harder in the latter.

32 of 42

Difficulties in developing a speech recognition system

Speaking mode 

− Ease of developing an ASR also depends on the speaking mode, that is whether the speech is in isolated word mode, or connected word mode, or in a continuous speech mode.

Note that a continuous speech is harder to recognize.

33 of 42

Difficulties in developing a speech recognition system

Speaking style 

− A read speech may be in a formal style,

or spontaneous and conversational with casual style.

The latter is harder to recognize.

34 of 42

Difficulties in developing a speech recognition system

Speaker dependency 

− Speech can be speaker dependent, speaker adaptive, or speaker independent.

A speaker independent is the hardest to build.

35 of 42

Difficulties in developing a speech recognition system

Type of noise 

− Noise is another factor to consider while developing an ASR.

Signal to noise ratio may be in various ranges, depending on the acoustic environment that observes less versus more background noise −

    • If the signal to noise ratio is greater than 30dB, it is considered as high range
    • If the signal to noise ratio lies between 30dB to 10db, it is considered as medium SNR
    • If the signal to noise ratio is lesser than 10dB, it is considered as low range

For example, the type of background noise such as stationary, non-human noise, background speech and crosstalk by other speakers also contributes to the difficulty of the problem.

36 of 42

Difficulties in developing a speech recognition system

Type of noise 

− Noise is another factor to consider while developing an ASR.

Signal to noise ratio may be in various ranges, depending on the acoustic environment that observes less versus more background noise −

    • If the signal to noise ratio is greater than 30dB, it is considered as high range
    • If the signal to noise ratio lies between 30dB to 10db, it is considered as medium SNR
    • If the signal to noise ratio is lesser than 10dB, it is considered as low range

For example, the type of background noise such as stationary, non-human noise, background speech and crosstalk by other speakers also contributes to the difficulty of the problem.

37 of 42

Difficulties in developing a speech recognition system

Microphone characteristics 

− The quality of microphone may be good, average, or below average.

Also, the distance between mouth and micro-phone can vary.

These factors also should be considered for recognition systems.

38 of 42

Difficulties in developing a speech recognition system

Microphone characteristics 

− The quality of microphone may be good, average, or below average.

Also, the distance between mouth and micro-phone can vary.

These factors also should be considered for recognition systems.

39 of 42

Types of Models in Speech Recognition

Models in speech recognition can conceptually be divided into an acoustic model and a language model.

The acoustic model solves the problems of turning sound signals into some kind of phonetic representation.

The language model houses the domain knowledge of words, grammar, and sentence structure for the language.

40 of 42

Phonetics

  1. Phonetics is the study of sound in human speech.
  2. Linguistic analysis of language around the world is used to break down human words into their smallest sound segments.
  3. In any given language, some number of phonemes define the distinct sounds in that language.
  4. In US English, there are generally 39 to 44 phonemes to find.

  • A Grapheme, in contrast, is the smallest distinct unit that can be written in a language.
  • In US English the smallest grapheme set we can define is a set of the 26 letters in the alphabet plus space.
    • Unfortunately, we can’t simply map phonemes to a grapheme or individual letters because some letters map to multiple phonemes sounds, and some phonemes map to more than one letter combination.

41 of 42

Phonetics

  1. For example, in English, the C letter sounds different in cat, chat, and circle.

  • Meanwhile, the phoneme E sound we hear in receive and beat is represented by different letter combinations.

  • Arpabet was developed in 1971 for speech recognition research and contains thirty-nine phonemes, 15 vowel sounds, and 24 consonants, each represented as a one or two-letter symbol.

42 of 42

Phonetics

  1. Phonemes are often a useful intermediary between speech and text.
  2. If we can successfully produce an acoustic model that decodes A Sound Signal Into Phonemes the remaining task would be to map those phonemes to their matching words.
  3. This step is called Lexical Decoding and is based on a lexicon or dictionary of the data set.
  4.  The system can’t tell from the acoustic model which combinations of words are most reasonable.
  5. That requires knowledge. We either need to provide that knowledge to the model or give it a mechanism to learn this contextual information on its own. It demands for a Language Model.