Universal Acceptance (UA) Micro-Learning Module: Module 8- Advanced Topics in Internationalized Domain Names(IDNs).
Instructor Guide
1st Edition.
���© 2024 Creative Commons License - Attribution 4.0 International (CC BY 4.0).�
Universal Acceptance
Advanced Topics in IDNs UA Micro-Learning Module Objectives:
| 2
Note About the Utilization of Unicode String Literals :
| 3
Understanding and Addressing IDNA2008 Limitations:
| 4
Introduction to Label Generation Rules (LGRs):
| 5
Evolution of LGR Formats: From Text-Based to XML-Based Standardization (RFCs 3743, 4690, and 7940):
| 6
Advantages of XML-based LGR format:
| 7
Example: Section-by-Section Illustration of an LGR Definition(1/3):
<lgr xmlns="urn:ietf:params:xml:ns:lgr-1.0"
xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
xsi:schemaLocation="urn:ietf:params:xml:ns:lgr-1.0 https://www.iana.org/assignments/lgr/lgr-1.0.xsd"
repertoire="urn:ietf:params:xml:ns:unicode-1.0">
<?xml version="1.0" encoding="UTF-8"?>
| 8
Example: Section-by-Section Illustration of an LGR Definition(2/3):
<!-- Metadata Element –>
<metadata>
<version>1.0</version>
<language>am</language>
<description>Label Generation Rules for Ethiopic script (Amharic)</description>
<author>አበበ ከበደ</author>
</metadata>
| 9
Example: Section-by-Section Illustration of an LGR Definition(3/3):
<rules>
<!-- Classes for different character types -->
<class id="consonant">
<description>Ethiopic consonants</description>
<rule>[ሀ-፼]</rule>
</class>
<!-- Additional classes for vowels, punctuation, digits, etc. -->
</rules>
| 10
Example: Full LGR Definition for Ethiopic Script(1/2):
<?xml version="1.0" encoding="UTF-8"?>
<lgr xmlns="urn:ietf:params:xml:ns:lgr-1.0"
xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
xsi:schemaLocation="urn:ietf:params:xml:ns:lgr-1.0 https://www.iana.org/assignments/lgr/lgr-1.0.xsd"
repertoire="urn:ietf:params:xml:ns:unicode-1.0">
<metadata>
<version>1.0</version>
<language>am</language>
<description>Label Generation Rules for Ethiopic script (Amharic) </description>
<author>አበበ ከበደ</author>
</metadata>
<rules>
<class id="consonant">
<description>Ethiopic consonants</description>
<rule>[ሀ-ፖ]</rule>
</class>
<class id="vowel">
<description>Ethiopic vowels</description>
<rule>[አኡኢኣኤእኦ] </rule>
| 11
Example: Full LGR Definition for Ethiopic Script(2/2):
</class>
<class id="punctuation">
<description>Ethiopic punctuation marks</description>
<rule>[፠-፧]</rule>
</class>
<class id="digit">
<description>Ethiopic digits</description>
<rule>[፩-፼]</rule>
</class>
<class id="joiner">
<description>Joiner character</description>
<rule>፡</rule>
</class>
<class id="diacritic">
<description>Ethiopic diacritic characters</description>
<rule>[፡-፦]</rule>
</class>
</rules>
</lgr
| 12
Variant Labels and their Definition in LGRs:
| 13
Supplementary/Complementary Rules to Variant Labels Definition in LGRs:
| 14
Example: Section-by-Section Illustration of an LGR Definition(1/7):
<BaseLabel>አማርኛ</BaseLabel>
| 15
Example: Section-by-Section Illustration of a Variant Label Definition(2/7):
<VariantLabel>
<Type>Diacritics</Type>
<Rules>
<AddDiacritics>
<!-- No Diacritic Specified -->
<!-- Additional diacritic specifications are empty -->
</AddDiacritics>
</Rules>
</VariantLabel>
| 16
Example: Section-by-Section Illustration of a Variant Label Definition(3/7):
<VariantLabel>
<Type>ScriptVariant</Type>
<Rules>
<ScriptVariation>ዐማርኛ</ScriptVariation>
</Rules>
</VariantLabel>
| 17
Example: Section-by-Section Illustration of a Variant Label Definition(4/7):
<VariantLabel>
<Type>Transliteration</Type>
<Rules>
<Transliteration>Amharic</Transliteration>
</Rules>
</VariantLabel>
| 18
Example: Section-by-Section Illustration of a Variant Label Definition(5/7):
<ContextualRules>
<ContextRule>
<Condition>አ</Condition>
<Action>
<AddCharacter>ዐ</AddCharacter>
</Action>
</ContextRule>
</ContextualRules>
| 19
Example: Section-by-Section Illustration of a Variant Label Definition(6/7):
<MappingTables>
<Mapping>
<Base>አማርኛ</Base>
<Variants>
<Variant>ዐማርኛ</Variant>
<!-- Additional variant mappings -->
</Variants>
</Mapping>
</MappingTables>
| 20
Example: Section-by-Section Illustration of a Variant Label Definition(7/7):
<ValidityCriteria>
<ValidVariant>
<Variant>ዐማርኛ</Variant>
<Stability>High</Stability>
</ValidVariant>
</ValidityCriteria>
| 21
Example: Full XML Code for Defining Label Variants in Ethiopic Script:(1/4):
<LGR>
<!-- Base Label -->
<BaseLabel>አማርኛ</BaseLabel>
<!-- Variant Label with Diacritics -->
<VariantLabel>
<Type>Diacritics</Type>
<Rules>
<!-- Define rules for adding diacritics -->
<AddDiacritics>
<!-- No Diacritic Specified -->
<!-- Additional diacritic specifications are empty -->
</AddDiacritics>
</Rules>
</VariantLabel>
<!-- Variant Label with Script Variation -->
<VariantLabel>
<Type>ScriptVariant</Type>
<Rules>
<!-- Define rules for script variation -->
<ScriptVariation>ዐ ማ ርኛ</ScriptVariation>
| 22
Example: Full XML Code for Defining Label Variants in Ethiopic Script:(2/4):
</Rules>
</VariantLabel>
<!-- Variant Label with Transliteration -->
<VariantLabel>
<Type>Transliteration</Type>
<Rules>
<!-- Define rules for transliteration -->
<Transliteration>Amharic</Transliteration>
</Rules>
</VariantLabel>
<!-- Contextual Rules -->
<ContextualRules>
<!-- Define rules that depend on context -->
<ContextRule>
<Condition>አ</Condition>
<Action>
<!-- Define actions specific to the context -->
<AddCharacter>ዐ</AddCharacter>
</Action>
</ContextRule>
| 23
Example: Full XML Code for Defining Label Variants in Ethiopic Script:(3/4):
</ContextualRules>
<!-- Actions -->
<Actions>
<!-- Define global actions -->
<GlobalAction>
<!-- Define global transformations -->
<SubstituteCharacters>
<!-- Define character substitutions -->
<Substitution>
<Original>አ</Original>
<Replacement>ዐ</Replacement>
</Substitution>
</SubstituteCharacters>
</GlobalAction>
</Actions>
<!-- Mapping Tables -->
<MappingTables>
<!-- Define explicit mappings -->
| 24
Example: Full XML Code for Defining Label Variants in Ethiopic Script:(4/4):
<Mapping>
<Base>አማርኛ</Base>
<Variants>
<Variant>ዐማርኛ</Variant>
<!-- Additional variant mappings -->
</Variants>
</Mapping>
</MappingTables>
<!-- Validity and Stability Criteria -->
<ValidityCriteria>
<!-- Define validity criteria for variants -->
<ValidVariant>
<Variant>ዐማርኛ</Variant>
<Stability>High</Stability>
</ValidVariant>
</ValidityCriteria>
</LGR>
| 25
Application of LGRs: Python Example on Label Verification and Variant Identification(1/3):
import xml.etree.ElementTree as ET
class LGRValidator:
def __init__(self, lgr_data):
self.lgr_tree = ET.ElementTree(ET.fromstring(lgr_data))
self.root = self.lgr_tree.getroot()
self.base_label = self.root.find("BaseLabel").text
self.variant_labels = self._get_variant_labels()
def validate_label(self, label):
if not isinstance(label, str):
raise TypeError("Label must be a string")
return label == self.base_label
def get_variant_labels(self):
return self.variant_labels
def _get_variant_labels(self):
variant_labels = set()
for variant_label in self.root.findall("VariantLabel"):
variant_type_element = variant_label.find("Type")
if variant_type_element is not None and variant_type_element.text:
variant_type = variant_type_element.text
variant = None
| 26
Application of LGRs: Python Example on Label Verification and Variant Identification(2/3):
if variant_type == "Diacritics":
diacritics_element = variant_label.find("Rules/AddDiacritics/Diacritic")
if diacritics_element is not None and diacritics_element.text:
diacritics = [d.text for d in variant_label.findall("Rules/AddDiacritics/Diacritic")]
variant = self._apply_diacritics(label=self.base_label, diacritics=diacritics)
elif variant_type == "ScriptVariant":
script_variation_element = variant_label.find("Rules/ScriptVariation")
if script_variation_element is not None and script_variation_element.text:
script_variation = script_variation_element.text
variant = self._apply_script_variation(label=self.base_label, script_variation=script_variation)
elif variant_type == "Transliteration":
transliteration_element = variant_label.find("Rules/Transliteration")
if transliteration_element is not None and transliteration_element.text:
transliteration = transliteration_element.text
variant = self._apply_transliteration(label=self.base_label, transliteration=transliteration)
if variant and variant != self.base_label:
variant_labels.add(variant)
return variant_labels
| 27
Application of LGRs: Python Example on Label Verification and Variant Identification(3/3):
def _apply_diacritics(self, label, diacritics):
return label + ''.join(diacritics)
def _apply_script_variation(self, label, script_variation):
return script_variation
def _apply_transliteration(self, label, transliteration):
return transliteration
def read_lgr_from_file(file_path):
with open(file_path, 'r', encoding='utf-8') as file:
lgr_data = file.read()
return lgr_data
# Specify the path to your LGR file
lgr_file_path = 'lgr_ethiopic_variant.xml'
lgr_data = read_lgr_from_file(lgr_file_path)
validator = LGRValidator(lgr_data)
label = "አማርኛ"
is_valid = validator.validate_label(label)
print(f"Is '{label}' valid? {is_valid}")
variant_labels = validator.get_variant_labels()
print(f"Variant labels for '{label}':")
for variant_label in variant_labels:
if variant_label != label: # Exclude the base label from the variant labels
print(variant_label)
| 28
Application of LGRs: Java Example on Label Verification and Variant Identification(1/5):
import org.w3c.dom.*;
import javax.xml.parsers.DocumentBuilder;
import javax.xml.parsers.DocumentBuilderFactory;
import java.io.IOException;
import java.io.StringReader;
import java.nio.file.Files;
import java.nio.file.Paths;
import java.util.HashSet;
import java.util.Set;
import org.xml.sax.InputSource;
import org.xml.sax.SAXException;
import javax.xml.parsers.ParserConfigurationException;
class LGRValidator {
private Document lgrDocument;
private String baseLabel;
private Set<String> variantLabels;
public LGRValidator(String lgrData) throws ParserConfigurationException, IOException, SAXException {
DocumentBuilderFactory factory = DocumentBuilderFactory.newInstance();
DocumentBuilder builder = factory.newDocumentBuilder();
| 29
Application of LGRs: Java Example on Label Verification and Variant Identification(2/5):
if (Files.isRegularFile(Paths.get(lgrData))) {
this.lgrDocument = builder.parse(Paths.get(lgrData).toFile());
} else {
this.lgrDocument = builder.parse(new InputSource(new StringReader(lgrData)));
}
this.lgrDocument.getDocumentElement().normalize();
this.baseLabel = this.getNodeTextContent("BaseLabel");
this.variantLabels = this.extractVariantLabels();
}
public boolean validateLabel(String label) {
return label != null && label.equals(baseLabel);
}
public Set<String> getVariantLabels() {
return variantLabels;
}
private Set<String> extractVariantLabels() {
Set<String> variantLabels = new HashSet<>();
NodeList variantLabelNodes = lgrDocument.getElementsByTagName("VariantLabel");
| 30
Application of LGRs: Java Example on Label Verification and Variant Identification(3/5):
for (int i = 0; i < variantLabelNodes.getLength(); i++) {
Element variantLabel = (Element) variantLabelNodes.item(i);
String variantType = this.getNodeTextContent(variantLabel, "Type");
String variant = null;
if ("Diacritics".equals(variantType)) {
variant = applyDiacritics(baseLabel, variantLabel.getElementsByTagName("Diacritic"));
} else if ("ScriptVariant".equals(variantType)) {
variant = getNodeTextContent(variantLabel, "Rules/ScriptVariation");
} else if ("Transliteration".equals(variantType)) {
variant = getNodeTextContent(variantLabel, "Rules/Transliteration");
}
if (variant != null && !variant.equals(baseLabel) && !variantLabels.contains(variant)) {
variantLabels.add(variant);
}
}
return variantLabels;
}
| 31
Application of LGRs: Java Example on Label Verification and Variant Identification(4/5):
private String applyDiacritics(String label, NodeList diacriticNodes) {
StringBuilder result = new StringBuilder(label);
for (int i = 0; i < diacriticNodes.getLength(); i++) {
result.append(getNodeTextContent((Element) diacriticNodes.item(i)));
}
return result.toString();
}
private String getNodeTextContent(String tagName) {
NodeList nodeList = lgrDocument.getElementsByTagName(tagName);
return nodeList.getLength() > 0 ? getNodeTextContent((Element) nodeList.item(0)) : null;
}
private String getNodeTextContent(Element element, String tagName) {
NodeList nodeList = element.getElementsByTagName(tagName);
return nodeList.getLength() > 0 ? getNodeTextContent((Element) nodeList.item(0)) : null;
}
private String getNodeTextContent(Element element) {
return element != null && element.hasChildNodes() ?
| 32
Application of LGRs: Java Example on Label Verification and Variant Identification(5/5):
element.getFirstChild().getTextContent() : null;
}
}
public class LGRValidatorMain {
public static void main(String[] args) throws ParserConfigurationException, IOException, SAXException {
String lgrFilePath = "lgr_ethiopic_variant.xml";
String lgrData = new String(Files.readAllBytes(Paths.get(lgrFilePath)));
LGRValidator validator = new LGRValidator(lgrData);
String label = "አማርኛ";
boolean isValid = validator.validateLabel(label);
System.out.println("Is '" + label + "' valid? " + isValid);
Set<String> variantLabels = validator.getVariantLabels();
System.out.println("Variant labels for '" + label + "':");
for (String variantLabel : variantLabels) {
if (!variantLabel.equals(label)) {
System.out.println(variantLabel);
}
}
}
}
| 33
Application of LGRs: Python and Java Output for the Examples Code:
Both the Python and Java Codes Produce the same Output as below:
Is 'አማርኛ' valid? True
Variant labels for 'አማርኛ':
ዐማርኛ
Amharic
| 34
Reference:
[1]. Hoffman, P., & Dürst, M. (2010, July). Internationalized Domain Names in Applications (IDNA2008) (RFC 5893). Retrieved from [https://www.ietf.org/rfc/] on 2023-12-06.
[2]. Internet Engineering Task Force (IETF). (2023). IDNA2008 - Internationalized Domain Names in Applications. Retrieved from [https://www.ietf.org/] on 2023-12-06.
[3]. Cloudflare, Inc. (2023). Understanding and Addressing IDNA2008 Limitations. Retrieved from [https://developers.cloudflare.com/cloudflare-one/account-limits/] on 2023-12-06.
[4]. Internet Engineering Task Force (IETF). (2018, July). Label Generation Rules (LGR) for the ASCII Scripts (RFC 8195). Retrieved from [https://www.ietf.org/rfc/] on 2023-12-06.
[5]. Unicode Consortium. (2023, December 5). Unicode Technical Standard #46 (UTS 46): IDNA Compatibility Charts. Retrieved from [https://www.unicode.org/] on 2023-12-06.
[6]. Alves, S., & Hoffman, P. (2004). Label Generation Rules: A Framework for Defining Legal Characters in Domain Labels (RFC 3743). Retrieved from https://www.ietf.org/rfc/ on 2023-12-06.
| 35
Reference:
[7]. Hoffman, P., & Blanchet, F. (2006). Internationalized Domain Names - Label Generation Rules: Syntax and Semantics (RFC 4690). Retrieved from https://www.ietf.org/rfc/ on 2023-12-06.
[8]. Blanchet, F., & Bruijn, J. (2016). Internationalized Domain Names - Label Generation Rules (LGR) in an XML-Based Format (RFC 7940). Retrieved from https://www.ietf.org/rfc/ on 2023-12-06.
[9]. Unicode Consortium. (2023, December 5). Unicode Technical Standard #46 (UTS 46): IDNA Compatibility Processing. Retrieved from https://www.unicode.org/] on 2023-12-06.
[10]. Internet Corporation for Assigned Names and Numbers (ICANN). (2013, March 20). Procedure to Develop and Maintain the Label Generation Rules for the Root Zone in Respect of IDNA Labels. Retrieved from https://www.icann.org/en/system/files/files/lgr-procedure-20mar13-en.pdf on 2023-12-26.
| 36
Author:
| 37