1 of 23

Universal Acceptance (UA) Micro-Learning Module: Module 5- Introducing Internationalized Domain Names(IDNs).

Instructor Guide

1st Edition.

���© 2024 Creative Commons License - Attribution 4.0 International (CC BY 4.0).

Universal Acceptance

2 of 23

IDNs UA Micro-Learning Module Objectives:

  • The "Introducing Internationalized Domain Names (IDNs)" micro-learning module provides an introduction to IDNs and their significance in the global Internet ecosystem.
  • At the end of this module, students should be able to:
    • Understand the concept and significance of Internationalized Domain Names (IDNs) in the domain name system (DNS);
    • Identify the different types of top-level domains (TLDs), including generic TLDs (gTLDs) and country-code TLDs (ccTLDs), and their role in organizing domain names;
    • Explain the process of converting Unicode-based domain names into an ASCII-compatible format using the Punycode Algorithm (RFC 3492);
    • Configure DNS servers to support IDN-encoded domain names, ensuring accurate resolution and functionality;
    • Differentiate between IDNA2003 and IDNA2008 standards and their impact on handling IDNs; and
    • Use network troubleshooting commands such as dig, traceroute, nslookup, mtr, and ping to effectively diagnose and resolve IDN-related issues specifically in the context of U-labels.

| 2

3 of 23

Introducing the root zone: TLD, gTLD, ccTLD:

  • Top-level domains (TLDs):
    • The highest level in the DNS tree structure.
      • Country Code Top-Level Domains (ccTLDs): ccTLDs are TLDs that are specifically designated for individual countries or geographic regions.
      • Generic Top-Level Domains (gTLDs): gTLDs are TLDs that are not inherently tied to any specific country or geographic region.
  • What is the Root Zone?
    • The root zone refers to the highest level in the hierarchical structure of the Domain Name System (DNS).
    • It is the starting point for resolving domain names on the internet.
    • In summary, the root zone is the highest level in the DNS hierarchy and contains the authoritative information about the TLDs.

| 3

4 of 23

What is IDN?

  • IDN stands for Internationalized Domain Name:
    • It enable internet users to register and access domain names in their native languages.
    • Makes the internet more inclusive and accessible to individuals and communities worldwide.
    • Traditionally, domain names were limited to a set of characters from the ASCII character set.
    • IDNs, domain names can now include characters from a wide range of scripts, such as Cyrillic, Arabic, Chinese, Japanese, Korean, and many others.

| 4

5 of 23

Why do we need IDNs?

  • IDNs have several important benefits:
    • Linguistic Diversity.
    • Cultural and Linguistic Preservation.
    • Localized Internet Presence.
    • Localized Email Addresses.
    • User-Friendly Experience.

| 5

6 of 23

Unicode Based Domain Names:

  • Unicode-based domain names refer to domain names created from Unicode characters, allowing for representation in various scripts and languages.
    • A-labels (ASCII labels):
      • ASCII-compatible encoding of Unicode-based domain names.
      • Example: the A-label representation of the U-label "экзампл.ком" (in Cyrillic script) would be "xn--80aniges7g.xn--j1aef" (using ASCII characters).
      • A-labels are used for internal processing and storage within the DNS infrastructure
    • U-labels (Unicode labels):
      • U-labels are the human-readable representation of Unicode-based domain names.
      • They consist of Unicode characters encoded in UTF-8 or UTF-16 format.
      • U-labels allow domain names to be registered and displayed in languages such as Chinese, Arabic, Cyrillic, and others.
      • The U-label 'ουτοπία.ευ' represents the Greek-based domain name 'ουτοπία.ευ’ (using Greek characters).

| 6

7 of 23

Punycode Algorithm (RFC 3492:.

  • It is a vital component in the implementation of Internationalized Domain Names (IDNs).
  • It prrovides a standardized method for converting Unicode-based domain names.
  • Punycode algorithm enables the seamless integration of IDNs.
  • The Punycode algorithm allows non-ASCII characters to be represented in a compatible format within the DNS infrastructure.

| 7

8 of 23

Normalization of Domain Name Strings or Labels:

  • It refers to the process of transforming and standardizing domain names to a consistent format
  • Some key aspects related to the normalization of domain name strings or labels:
    • Unicode Normalization: Unicode normalization, specifically Unicode Normalization Form C (NFC) or Normalization Form D (NFD), is employed to convert domain names to a standardized Unicode representation.
    • Case Normalization:Domain names are case-insensitive, meaning that the case of letters in a domain name does not affect its resolution.
    • IDN Normalization: To ensure interoperability, IDNs undergo normalization.
      • Unicode Normalization Form C (NFC).
      • Normalization Form D (NFD)
    • Punycode Encoding:
      • IDNs encoded in Punycode are first decoded to obtain the original Unicode representation.
      • Normalization is then applied to the Unicode form of the domain name.

| 8

9 of 23

Normalization Requirements of Domain Name Strings or Labels:

  • Some of the factors that make domain name normalization unique:
    • Case Insensitivity.
    • ASCII Compatibility.
    • Label Separators.
    • Unicode Normalization
      • NFC
    • DNS Considerations:
      • Label Length- a label needs to be a maximum of 63 characters long.
      • Domain Name Length- FQDN needs to be a maximum of 253 characters long.
      • Path Length- a complete domain name path needs to be a maximum of 255 characters long.

| 9

10 of 23

Example- Punycode algorithm Steps.

  • Punycode algorithm using an example domain name in Arabic script: العربية.com (the word "Arabic" in Arabic script):
    • Input: the Unicode input string is العربية.com.
    • Prepare the Input: Apply NFKC
    • Encoding
    • ASCII Conversion
    • Basic Encoding
    • Handling Non-Basic Code Points
    • Handling Bias
    • Output: the Punycode representation for العربية.com would be "xn—mgba3a4f16a.com"
    • Conversion Complete: the resulting Punycode representation, "xn--mgba3a4f16a.com" is now ACE, and it can be used within the existing DNS infrastructure.

| 10

11 of 23

Configuring DNS for IDN-Encoded Domain Names:

  • Configure DNS for IDN-encoded domain names:
    • selecting the appropriate encoding,
    • mapping- establishing a relationship between the IDN-encoded domain name and its corresponding ASCII-compatible representation (Punycode), and
    • configuring DNS records and name servers accordingly.

| 11

12 of 23

Configuring DNS for Non-ASCII Domain Names- Key Steps:

  • Verify Registrar Support: Ensure that your domain name registrar supports Internationalized Domain Names (IDNs).
  • Choose IDN Encoding: Select the appropriate encoding mechanism for representing non-ASCII characters in your domain name.
    • The standard encoding scheme is Punycode, which converts non-ASCII characters into an ASCII-compatible format.
  • Encode the Domain Name: Apply the chosen encoding mechanism (e.g., Punycode) to convert your domain name with non-ASCII characters into an ASCII-compatible representation.
  • Configure DNS Records: Set up DNS records for your IDN-encoded domain name.
  • Name Server Configuration: Configure the name servers for your domain to handle IDN-encoded domain names.
  • Test and Validate: Perform thorough testing and validation of your DNS configuration to ensure that the IDN-encoded domain name resolves correctly.

| 12

13 of 23

Internationalized Domain Names in Applications (IDNA) 2003 (1/2):

  • Key features of IDNA2003 include:
    • Punycode Encoding: IDNA2003 utilizes Punycode encoding to represent non-ASCII characters in an ASCII-compatible format.
    • Unicode Normalization: IDNA2003 requires Unicode normalization to ensure consistency and avoid variations in equivalent character sequences.
    • Mapping Characters: IDNA2003 has specific rules for mapping characters that are not allowed in domain names.
    • Label Length Limit: DNA2003 imposes a limit of 63 characters for each label within an IDN. This limit includes both ASCII and non-ASCII characters.

| 13

14 of 23

Internationalized Domain Names in Applications (IDNA) 2003 (2/2):

  • It has critical limitations:
    • Limited Character Set or Restricted Character Repertoire: IDNA2003 only supports a limited set of Unicode characters known as Unicode 3.2
    • Language-specific Rules: IDNA2003 uses language-specific rules for handling certain characters.
    • It leads to inconsistencies and conflicts: IDNA2003 does not have built-in script awareness.
    • Lack of Script Awareness: IDNA2003 uses a normalization process called Nameprep, which is based on Unicode 3.2.
    • Limited Normalization: IDNA2003 uses a normalization process called Nameprep, which is based on Unicode 3.2.
    • Security Vulnerabilities: IDNA2003 introduced security concerns related to homograph attacks.
    • Lack of error handling: IDNA2003 does not provide explicit error handling mechanisms.

| 14

15 of 23

Internationalized Domain Names in Applications (IDNA) 2008:

  • Key features and Improvements of IDNA2008 include:
    • Extended Character Set: character set based on Unicode 5.2.
    • Backward Compatibility: While IDNA2008 is not fully backward compatible with IDNA2003, efforts were made to minimize disruption during the transition.
    • Script-aware Processing: IDNA2008 introduces script awareness.
    • Contextual Rules: IDNA2008 introduced contextual rules for certain characters.
    • Enhanced Normalization: IDNA2008 improves the normalization process(NFC, NFD, NFKC, NFKD).
    • Bidi (bidirectional) Support: IDNA2008 addresses bidirectional text handling
    • Security enhancements: IDNA2008 introduces several security measures to mitigate homograph attacks.
    • Error handling: IDNA2008 provides explicit error handling mechanisms.

| 15

16 of 23

Ensuring Proper Display: How IDNA2008 Manages Bidirectional Text Ordering?

  • To ensure the accurate ordering and representation of mixed-script domain names, IDNA2008 implements the following steps:
    • Directional Formatting Characters (DFCs): IDNA2008 uses Directional Formatting Characters (DFCs) to control the directionality of text within a domain name.
    • RAL and LAL Labels: In IDNA2008, a domain name is divided into labels separated by dots. Each label can be either a Right-to-Left (RTL) label (RAL) or a Left-to-Right (LTR) label (LAL).
    • RTL Embedding and LTR Embedding: IDNA2008 introduces the RTL Embedding (RLE) and LTR Embedding (LRE) DFCs.
    • Punctuation Handling: IDNA2008 defines rules for handling punctuation marks and symbols within bidi text.
    • Contextual Rules: IDNA2008 incorporates contextual rules to determine the correct ordering of characters within a label.

| 16

17 of 23

Potential Compatibility Issues between IDNA2008 and IDNA2003:

  • IDNA2003 Vs 2008: there are some cases where differences in the handling and interpretation of certain characters- some examples.
    • Character Set Differences: IDNA2008 includes an expanded character set compared to IDNA2003.
    • Unicode Normalization: IDNA2008 requires Unicode normalization (NFC or NFKC) before encoding domain names.
    • Contextual Rules: IDNA2008 introduces refined contextual rules for character handling, especially in the context of bidi text.
    • Error Handling: IDNA2008 provides more explicit error handling mechanisms compared to IDNA2003.
    • Mapping and Compatibility: IDNA2008 introduced improved mapping and compatibility mechanisms for visually similar characters.

| 17

18 of 23

IDN Support in FTP, HTTP, and HTTPS: Addressing the Limitations.

  • IDNA2003 Vs 2008: there are some cases where differences in the handling and interpretation of certain characters- some examples.
    • FTP (File Transfer Protocol):
      • Limitations on character encoding.
    • HTTP (Hypertext Transfer Protocol):
      • Limitations on character encoding.
    • HTTPS (HTTP Secure):
      • Inherits the same limitations as HTTP.
  • Note:
    • While there are limitations in directly supporting IDNs in protocols like FTP, HTTP, and HTTPS, the use of Punycode encoding allows for the representation and usage of IDNs in these protocols.

| 18

19 of 23

Network Troubleshooting Commands for IDNs: dig, traceroute

  • dig (Domain Information Groper):
    • Usage Example: Suppose you want to query information for the U-label "普遍适用测试.我爱你"
    • Convert it to Punycode (A-label) format: "xn—tkvs6ms8gqpywye3ma.xn—6qq986b3xl".

  • Traceroute:
    • Usage Example: you can use traceroute in both U-labels orA-labels (Punycode):

dig xn--tkvs6ms8gqpywye3ma.xn--6qq986b3xl

traceroute 普遍适用测试.我爱你

traceroute xn—tkvs6ms8gqpywye3ma.xn--6qq986b3xl

| 19

20 of 23

Network Troubleshooting Commands for IDNs: nslookup,ping, and mtr:

  • nslookup (Name Server Lookup):
    • Usage Example: for instance to query information for the U-label "普遍适用测试.我爱你.
    • Convert it to Punycode (A-label) format: "xn—tkvs6ms8gqpywye3ma.xn—6qq986b3xl".

  • mtr (My Traceroute):
    • Usage Example: for instance to diagnose network issues with the U-label “普遍适用测试.我爱你".
    • convert it to Punycode representation as “xn—tkvs6ms8gqpywye3ma.xn—6qq986b3xl”.

  • Ping:

nslookup xn--tkvs6ms8gqpywye3ma.xn—6qq986b3xl

mtr xn—tkvs6ms8gqpywye3ma.xn--6qq986b3xl

ping 普遍适用测试.我爱你

ping xn--tkvs6ms8gqpywye3ma.xn--6qq986b3xl

| 20

21 of 23

Reference:

[1]. Internet Corporation for Assigned Names and Numbers (ICANN). (2023, October 4). Top-level domains (TLDs). Retrieved from https://www.icann.org/resources/pages/tlds-2012-02-25-en.

[2]. 1.ICANN. (2023, November). Guidelines for the Implementation of Internationalized Domain Names, Version 3.0. Retrieved November 25, 2023, from https://www.icann.org/resources/pages/idn-guidelines-2011-09-02-en.

[3]. Internet Assigned Numbers Authority (IANA). (2022, May 23). Root zone. Retrieved from https://www.iana.org/domains/root: https://www.iana.org/domains/root.

[4]. International Corporation for Assigned Names and Numbers (ICANN). (2023, October 4). Internationalized Domain Names (IDNs). Retrieved from https://www.icann.org/resources/pages/idn-2012-02-25-en.

[5]. International Corporation for Assigned Names and Numbers (ICANN). (2023, October 4). Unicode-based domain names (IDNs). Retrieved from https://www.icann.org/resources/pages/idn-2012-02-25-en.

| 21

22 of 23

Reference:

[6]. Alvestrand, H., & Rose, M. (2003, March). Punycode: A Bootstring encoding of Unicode for Internationalized Domain Names in Applications (IDNA). IETF. https://doi.org/10.17487/RFC3492.

[7]. Internet Engineering Task Force (IETF). (2003, March). Internationalizing Domain Names in Applications (IDNA): Requirements and Solutions. Retrieved from https://doi.org/10.17487/RFC3492: https://doi.org/10.17487/RFC3492.

[9]. Internet Engineering Task Force (IETF). (2008, June). Internationalizing Domain Names in Applications (IDNA): Current Status. Retrieved from https://doi.org/10.17487/RFC5893: https://doi.org/10.17487/RFC5893.

| 22

23 of 23

Author:

  • Dessalegn Mequanint Yehuala, dessalegn.mequanint@aau.edu.et

| 23