Multidimensional Quality Metric Quality Issue Types
version 2.5.5 (2013 September 3)
2.5 (2013 August 28) • Link
2.4 (2013 June 24) • Link
2.3 (2013 June 17) • Link
Note: This version is preliminary and the content, name, and value of nodes within the hierarchy may move. Citations to this hierarchy must reference the version number and URL to ensure accuracy.
This document describes the issues only. Information on dimensions and scoring methods is maintained separately.
This document provides an overview of the structure of the Multidimensional Quality Metrics (MQM)’s set of issue types. It includes a description and examples of each type, along with graphical representations of the overall structure. This document addressed only product quality issues (i.e., those related to the translation product) and does not address project or production quality.
This version differs significantly from earlier versions in that it makes a distinction between core and extension issue types. Prior to this distinction, MQM had a high degree of complexity that was overkill for most applications. By moving the complexity into modules and maintaining a compact and simple core, MQM implementation will be easier for the majority of cases and it will be clearer when additional complexity should be invoked.
The list of MQM issue types defines a catalog of issues types relevant for assessing the quality of both translated texts and monolingual documents. While many of the issue types will not apply in the case of monolingual documents, the majority do apply and can be used to evaluate source document quality relative to the quality of translated documents.
Multidimensional Quality Metrics (MQM) defines a set of issue types related to translation product quality. It does not address translation project- or production-related issues, even though a full consideration of translation quality would address these issues. Readers interested in these aspects are invited to consult the EN 15038 and ISO 9000-series standards, as well as relevant section of ISO/TS-11669.
The term issue as used in this document refers to any potential error detected in a text, even if it is determined not to be an error. For example, if an automated process finds that a term in the source does not appear to have been translated properly, it has identified an issue. If human examination finds that the term was translated improperly, it is an error. However, examination might also find that the issue was not an error because the linguistic structure in the translation dictated that the term be replaced by a pronoun, so the translation is correct.
In most cases of translation quality assessment, issues will be errors, but with automated issue detection, some issues will not be errors. Accordingly this document refers to issues in most contexts.
Although not covered in this document, the concept of specifications (dimensions) is vital in MQM. More can be learned about specifications/dimensions at the QTLaunchPad website. Specifications help determine what should be counted as an error. For example, if specifications state that a text is being translated for use in a regulated industry, issues related to legal claims will be important in a way that they would not be for a text intended as humorous commentary on contemporary German politics.
As a result, issues/errors should be counted only with respect to the specifications and corresponding metric chosen, including any locale conventions. For example, an informative text about Hungarian culture might mention that Hungarian names use a family name-first ordering convention. If this text were translated in Hungarian, however, this explanation would be omitted since any educated reader would already know about Hungarian naming conventions. In such cases omission would not be an error and should not be counted against the translator.
In most cases issues that could be errors may not be errors if done intentionally and appropriately by the translator. Reviewers need to be aware of and competent in interpreting specifications and metrics to avoid improper penalization of translators.
The default MQM scoring method is via error counts. Errors and their severities are counted to assign penalties, which are deducted from a theoretical perfect score of 100% to deliver a percentage quality score. Individual issue types can also be “weighted” to give them more or less importance.
Scoring is described on the QTLaunchPad website.
The following issue types and structure are based on an analysis of existing human translation-oriented quality metrics and systems. It represents a non-strict superset of the issues found in existing systems. It is a non-strict superset because it does not contain the full granularity of some existing systems. For example, the Checkmate quality system (part of the open-source Okapi framework) includes very detailed issue types for dealing with whitespace, which are subsumed into a single category in MQM. With the exception of such issues where some granularity may be lost, the existing quality assessment systems can be mapped to MQM and described in terms of the issue types listed in this document.
Quality assessment metrics/tools consulted in creating MQM include the following:
Readers may also wish to consult the following items:
MQM consists of a set of “issue types,” potential errors that can be detected in texts. Although MQM is oriented towards assessing the quality of translations, many of the issues can be applied to monolingual texts as well. The issues in MQM are organized in a hierarchy. At the highest level, they are grouped in five categories (plus Other), as shown below:
Figure 1. Top-level structure
Of these, three categories are considered “core”: Accuracy, Fluency, and Verity. Three additional top-level categories are treated as modules that may be used for special purposes: Design, Internationalization, and Compatibility. In addition, the category Other is reserved for any issues that are not otherwise covered in MQM.
The definition of these top-level categories are as follows:
Each of the branches listed above expands into a list of specific issue types, arranged hierarchically (with the exception of Compatibility, where the issues are a flat list, and Internationalization, which is presently unelaborated). The following sections will describe the structure of these branches.
Within each of the core branches, some issue types are considered “core” and others are present in extended modules that can be invoked as needed. The MQM consists of a core of 19 issue types. These issue types are relatively high-level issues that can account for most issues related to translation itself. The core can be represented as follows:
Figure 2. Core issue types. (An asterisk (*) after an issue name indicates issues that are amenable to automatic detection)
(The labels Content and Mechanical are for convenience in grouping issues.)
The core contains a total of 19 issue types, defined below. It is not anticipated that assessment tasks will use all 19 categories, but rather will use a relevant selection.
NOTE: within any branch of MQM, ordering is significant: If multiple issue types could apply to an issue, the first relevant one should be selected. See the section Guidelines for selecting issue types (below) for more details on selecting issue types.
Note that the three high-level branches serve as issue types in the core in their own right. They are used for any cases of issues that fall under their scope but which are not defined by a subtype.
The target text does not accurately reflect the source text, allowing for any differences authorized by specifications.
Note: Most cases of Accuracy are addressed by one of the more specific subtypes listed below.
Example: A French text translates English “e-mail” as “e-mail” but terminology guidelines mandated that courriel be used.
Example: The English musicological term dog is translated (literally) into German as Hund instead of as Schnarre, as specified in a terminology database.
Example: A source text states that a medicine should not be administered in doses greater than 200 mg, but the translation states that it should not be administered in doses less than 200 mg.
Example: A paragraph present in the source is missing in the translation
Example: A sentence in a Japanese document translated into English is left in Japanese.
Example: A translation includes portions of another translation that were inadvertently pasted into the document.
Issues related to the form or content of a text, irrespective as to whether it is a translation or not.
Note: If an issue can be detected only by comparing the source and target, it MUST not be categorized as a Fluency issue.
Example: A legal notice in German uses the informal du instead of the formal Sie.
Example: A text uses a confusing style with long sentences that are difficult to understand.
Example: The text states that bug reports should be submitted to a mailing list in one place and via an online bug tracker tool in another.
Example: The German word Zustellung is spelled Zustetlugn.
Example: A text uses punctuation incorrectly.
Example: A text has an extraneous hard return in the middle of a paragraph.
Example: An English text reads “The man was in seeing the his wife.”
Example: An incorrect format for currency is used for a German text, with a period (.) instead of a comma (,) as a thousands separator.
Example: The following text appears in an English translation of a German automotive manual: “The brake from whe this કુતારો િસ S149235 part numbr,,."
Example: The text states that a feature is present on a certain model of automobile when in fact it is not available.
Example: A process description leaves out key steps needed to complete the process, resulting in an incomplete description of the process.
Example: Specifications stated that FCC regulatory notices be replaced by CE notices rather than translated, but they were translated instead, rendering the text legally problematic for use in Europe.
Example: An advertising text translated for Sweden refers to special offers available only in Germany.
As mentioned above, it is not expected that most assessment tasks will use all of the core categories. Instead, they represent the most common translation quality assessment issue types and can serve as a common set from which to build metrics. For example, consider a task in which machine translation used for on-demand support purposes for a software package is assessed. In this case it is:
The resulting metric, which complies with the MQM Core, might appear as follows, with five issues:
Figure 3. Sample metric for assessing on-demand translation of support materials.
This metric is used to assess the suitability of translations and is interested only in the extent to which the system produces unreadable content (Unintelligible), incorrectly translates content (Mistranslation) or leaves content untranslated (Untranslated), violates terminology requirements as defined in a bilingual glossary (Terminology), and has correct spelling (Spelling). This simple metric might be adequate for determining if the MT system is producing acceptable results. (Note that Accuracy and Fluency are grayed out, indicating that they are not counted separately and that only the five issue types shown are counted.)
It is anticipated that most MQM-compliant metrics would use a small number of issue types. However, because some requirements may dictate additional detail/granularity, MQM contains extensions, as discussed in the next section. Users are encouraged, where possible, to limit issue type selection to the core in order to foster greater interoperability. Where extensions are required, their use should be limited as much as possible and the most abstract level of granularity that meets requirements should be used.
As previously noted, extensions provide a way to add capabilities or granularity to MQM. This section describes the extensions to each branch of the MQM issues, including definitions and examples. As some of the content in each extension consists of categories intended to give deeper levels of granularity to existing categories, categories from the core may be repeated in the extensions, but will be rendered in gray.
As with the core structure, extensions will generally not be used in their entirety, but rather a selection may be used. For example, if an assessment task is being undertaken to understand the status of a particular translation with respect to grammatical issues, the Fluency extension may be used and the more detailed subcategories under Grammar used.
The Accuracy extension consists entirely of nine categories that provide additional granularity beyond the core, as shown in the following diagram:
Figure 4. Structure of the Accuracy extension.
The additional issues in this extension are defined as follows.
Example: A database of legal terms mandates that the English term contract be translated as Auftrag in German, but the more common Vertrag was used.
Example: A Hungarian text contains the phrase Tele van a hocipőd?, which has been translated as “Are your snow boots full?” rather than with the idiomatic meaning of “Feeling overwhelmed?”.
Example: The Italian word simpatico has been translated as sympathetic in English.
Example: A Japanese translation refers to “Apple Computers” as アップルコンピュータ when the English expression should have been left untranslated.
Example: A German source text provides the date 09.02.09 (=February 9, 2009) but the English target renders it as September 2, 2009.
Example: An English source text specifies a time of "4:40 PM" but this is rendered as 04:40 (=4:40 AM) in a German translation.
Example: A source text specifies that an item is 25 centimeters (~10 inches) long, but the source states that it is 25 inches (63.5 cm) long.
Example: The source text specifies that a part is 124 mm long but the target text specifies that it is 147 mm long.
Example: The source text refers to Dublin, Ohio, but the target incorrectly refers to Dublin, Ireland.
Example: Part labels in a graphic were left untranslated even though running text was translated
The Fluency extension consists of 38 additional issue types, including both new high-level categories and additional granularity.
The definitions for the extension issues are as follows:
Twelve (12) issues are added in the content branch, as shown below:
Figure 5. Structure of the Content branch of the Fluency extension.
These issues are defined as follows:
Example: A refers to dollars as “clams,” when this slang term would be inappropriate.
Example: Company style states that passive sentences may not be used but the text uses passive sentences.
Example: Specifications stated that English text was to be formatted according to the Chicago Manual of Style, but the text delivered followed the American Psychological Association style guide.
Example: A text uses both “app.” and “approx.” for approximately.
Example: A screen shot shows a button with the text “Open other…” but the text referring to the screen shot tells the user to click on the “Open alternative…” button.
Example: The text has a mixture of imperatives, descriptions of actions, and lists within a single process, making it difficult to follow the intended course of action.
Example: The text refers to a component as the brake release lever, brake disengagement lever, manual brake release, and manual disengagement.
NB: This issue should not be used to cases where terminology has been translated incorrectly (Accuracy: Terminology) or cases where the wrong term is used in a source document (Fluency: Content: Monolingual Terminology).
Example: A text reads “The man the man whom she saw…”
Example: The term piano action should be used but piano mechanism is used instead.
Example: A text uses the term “Acme TM200" instead of the mandated “Acme TM2000®”.
Example: A text reads “I cannot recommend this too highly.” (The meaning can be that the speaker cannot make a good recommendation or that it is highly recommended.)
Example: A text reads “After completing this, move to the next step,” but there are a number of possible referents for this in the text.
Twenty-seven (27) issues are added to the Mechanical branch, as shown below:
Figure 6. Structure of the Mechnical branch of the Fluency extension.
Example: The name John Smith is written as “john smith”
Example: The Hungarian word bőven (using o with a double acute) is spelled as bõven, using a tilde (˜), which is not found in Hungarian.
Example: An English text uses a semicolon where a comma should be used.
Example: A text reads “King Ludwig of Bavaria (1845–1996 was deposed on account of his supposed madness.”
Example: An English text has comed instead of came.
Example: A text reads “Read these instructions careful” instead of “Read these instructions carefully.”
Example: A text reads “They was expecting a report.”
Example: A German text reads “Er hat gesehen den Mann” instead of “Er hat den Mann gesehen.”
Example: A text reads “Check the part number as given in the screen” instead of “…on the screen”.
Example: A text reads “The graphic is then copied into an internal memory” instead of “The graphic is copied to internal memory.”
Example: An English text has “2012-06-07” instead of the expected “06/07/2012.”
Example: A text written for the U.S. uses a 24-hour time notation rather than AM/PM time.
Example: A text in France uses feet and inches and Fahrenheit temperatures.
Example: A German text has 123,456 instead of the locale-appropriate 123.456.
Example: A French text should use guillemets («») but instead uses German-style quotes („”)
Example: A French advertising text uses anglicisms that are forbidden for print texts by the Academie française specifications.
Example: A text document in UTF-8 encoding is opened as ISO Latin-1, resulting in all “upper ASCII” characters being garbled.
Example: A text may not include colons or forward- or back-slashes, which might cause confusion with path names on some computer systems, but it contains theses characters.
Example: The regular expression ["'”’][,\.;] (i.e., a quote mark followed by a comma, full stop, or semicolon) is defined as not allowed for a project but a text contains the string ”, (closing quote followed by a comma).
Example: A listing of items should be in alphabetical order but appears in a random order instead.
Example: A text reading “The harbour connected which to printer is busy or configared not properly” is flagged by a language analysis tool as suspect based on its lack of conformance to an existing corpus.
Example: An HTML document has an href that points to a file that does not exist.
Example: An internal link refers to the location “#section5” but there is no anchor “section5” in the document.
Example: A link in an HTML document points to a U.S. government URL that has moved and no longer exists.
Example: A Table of Contents is missing items that should be included.
Example: A table of contents refers to page numbers from the source document that do not apply to the translated text.
Example: A Table of Content should be formatted with variable (hierarchical) indenting and tab leader characters, but is instead displayed as a “run-in” list.
Example: A chapter heading is not listed in a Table of Contents.
The Verity extension consists of two categories that extend the granularity of one category, as shown below:
Figure 7. Structure of the Verity extension
Example: A list of items included in a retail package omits a crucial component.
Example: A document describing a procedure to restart a diesel generator omits a crucial step that must be completed prior to performing additional steps.
The Design extension comprises the entire Design branch of MQM. It applies only in cases where formatting is significant. It consists of 36 issue types, in a hierarchy, as shown below:
Figure 8. Structure of the Design extension.
Note that for computational purposes in generating MQM scores, Design is generally counted with Fluency, although individual issues may align more closely with Accuracy in concept.
Example: Headings should be blue but are green instead.
Example: A English source text uses a normal-weight serif font for body text but the Japanese translation uses a heavy-weight “gothic” (roughly, sans-serif) font appropriate for headlines only.
Example: Specifications state that endnotes should be used with roman numerals but footnotes were used with in-text symbols (*, †, ‡, etc.).
Example: Headers should appear on every page but have been omitted on odd-numbered pages.
Example: Specifications called for 4 cm inside margins, but 2.5 cm margins were used instead.
Example: Specifications state that at least two lines of a paragraph must appear on a page (if the paragraph is more than one line), but a single line starts a page while two appear on the previous page.
Example: There is a page break between a figure and its caption.
Example: A heading should be left-aligned but was centered instead.
Example: The first line of body paragraphs should be indented 4 mm, but some paragraphs were indented 25 mm instead.
Example: Warning texts are set in sans-serif, but one of them appears in a serif font.
Example: A portion of Japanese text is set with an obliqued face (corresponding to italics in the source text) when dot accents should have been used with a non-oblique face.
Example: A book title should have been italicized, but the italics were omitted.
Example: A legal notice should be set in a 9 pt size, but was instead set in 7 pt.
Example: A Japanese text includes カタカナ (full-width kana) when specifications required ｶﾀｶﾅ (half-width kana) instead, due to a limited display size.
Example: The letters T and A in the word TAMPA are spaced too close together and collide.
Example: A translated Japanese text has set lines too close together, making the text difficult to read.
Example: A target text has a set of tags for bold face in the same location where the source has tags for italics.
Example: A segment has three sets of paired formatting tags at the end, after the final full stop (.).
Example: A source segment has no formatting tags, but the target has a set of italic tags.
Example: A source segment has a set of italic tags, but the target text does not have any tags.
Example: A text has opening tags but no closing tags for formatting.
Example: A document uses a string of space characters instead of tabs
Example: Extra spaces are added at the start of a string
Example: A graphic is garbled and the wrong version is shown
Example: A text refers to Figure 1, but Figure 1 appears six pages after the point where it was referred to.
Example: An HTML file has an <img> tag that refers to the wrong location, so no graphic is shown.
Example: During localization the location of numbers used for call-outs has been shifted and the call-outs are no longer usable.
Example: The German translation of an English string in a user interface runs off the edge of a dialogue box and cannot be read.
Example: An English sentence is 253 characters long but its German translation is 51 characters long.
The Internationalization extension presently consists of a single issue type, Internationalization, which is used for any internationalization errors. This branch may be expanded in the future.
Example: A document assumes that all addresses use postal codes conforming to the U.S. “zip+four” convention and includes a verification step for postal codes that does not allow for non-U.S. codes.
The Compatibility extension contains items which may be used for compatibility with legacy metrics even though they would otherwise not be included in MQM. Most of these issue types are taken from the LISA QA Model documentation.
Definitions are not included for these issues.
These categories should not be included in MQM-compatible metrics unless used to represent legacy metrics in MQM-compatible form. They relate either to process or project requirements (which are not covered by MQM and would otherwise be out of scope), address functional issues (such as application compatibility) that are unrelated to linguistic quality, or address
These categories should be used for compatibility with legacy metrics (particularly the LISA QA Model) only and their general use is discouraged. These categories are not shown in the graphical overviews in this document.
The Other category is used as a catch-all for any issues not adequately covered by the MQM core or extensions. This category should be used only if it is impossible to assign an issue to an existing category with sufficient granularity.
If such issues are systematically encountered, please inform firstname.lastname@example.org for consideration for inclusion in updates of MQM issue types.
Additional extensions can be defined by users and may be added as official extensions in time. Additional extensions should not conflict with the core or existing extensions or replace any existing categories. They may add granularity to MQM or add new issue types not anticipated in MQM.
In many cases multiple issue types may describe a single actual issue/error in the text. For example, if a term is translated incorrectly, this is an example of both the Terminology and Mistranslation issue types; if a date is incorrect in the target text it is simultaneously a Mistranslation, a Date/Time, and and Entity (such as a name or place) issue. In such cases issues should not be counted multiple times but instead one issue type should be selected based on the following principles:
 Compatibility is not shown in any of the diagrams shown in this document. It is used for a number of legacy issues described in other systems (notably the LISA QA Model) that address issues not relate to product quality, but issue types contained in it are deprecated for general use.
 These labels can be used as issue types in their own right, but are not counted here.
 Note that the “world of the text” may or may not be the real world. In the case of fiction, propaganda, or marketing, claims may arise that are not, in fact, true to the real world, but which are true to the world assumed by the text.
 The daughter categories under Terminology should be used only when it is necessary to identify the precise nature of the terminology error, e.g., if it is important to know that a process did not follow a termbase versus knowing that the process did not know common terminology for the domain.
 If only the measurement system, rather than the actual value, is wrong, an issue should be treated under locale convention.
 In most cases Monolingual terminology will apply to source texts. If a formal bilingual glossary is specified for use, terms in the target text that do not match the translations specified in the glossary MUST be classified as Terminology since a bilingual resource was specified in the project requirements. If in doubt, use Terminology.
 This category is to be used only to indicate terminology problems related to a formalized, normative term list when they must be distinguished from errors that do not relate to normative lists.
 For cases of systematic uses of quote mark formats from the wrong locale, use Quote marks format under Locale convention instead.
 The determination of what characters are or are not allowed is workflow specific and cannot be stated as a general rule. This category should be used only when specific characters are forbidden.
 For issues related to the collation of an index or table of contents, use Index/TOC collation instead.
 This category is included for compatibility with the ITS 2.0 specification and readers should consult the ITS 2.0 specification (http://www.w3.org/TR/its20/#lqissue-typevalues) for more information.
 This category and document-external broken link/cross-reference should be used only if it is important to identify the nature of the broken link/cross-reference.
 Given the complexity of the markup, many markup issues will need to be verified manually since changes may be deliberate and needed.
 Issues related to fonts and text in graphics should be handled in other categories, as appropriate.
 NB: Significance in this context depends on the languages involved and many other factors and no general guideline as to what constitutes significance is possible.