Universal Acceptance (UA) Micro-Learning Module: Module 12- Unicode Support in Operating Systems.
Instructor Guide
1st Edition.
���© 2024 Creative Commons License - Attribution 4.0 International (CC BY 4.0).�
Universal Acceptance
Unicode Support in Operating Systems UA Micro-Learning Module Objectives:
| 2
Note About the Utilization of Unicode String Literals :
| 3
Why do we need Unicode in Operating Systems?
| 4
Some Examples of Operating Systems that Offer Unicode Support:
| 5
Unicode String Manipulation Functions, Collation and Sorting in Operating Systems:
| 6
Example on Setting the Locale of a Linux Operating System:
locale -a
export LC_ALL=ar_MA.UTF-8
locale
| 7
Unicode-aware Collation in Operating Systems:
| 8
Examples of Unicode String Manipulation Functions Performed by Operating Systems(1/5):
$ echo -n "تجربة-بريد-الكتروني@تجربة-القبول-الشامل.موريتانيا" | wc -m
$ echo "Arabic Email Address: " > file1
$ echo "تجربة-بريد-الكتروني@تجربة-القبول-الشامل.موريتاني" >> file1
$ cat file1
Output: Arabic Email Address:
تجربة-بريد-الكتروني@تجربة-القبول-الشامل.موريتاني
| 9
Examples of Unicode String Manipulation Functions Performed by Operating Systems(2/5):
$ echo تجربة-بريد-الكتروني@تجربة-القبول-الشامل.موريتانيا | cut -d '@' -f2
cut: This command cuts the text into fields based on a delimiter.
-d '@': This specifies the delimiter as "@" (at symbol).
-f2: This tells cut to print the second field (everything after the "@").
Output: تجربة-القبول-الشامل.موريتانيا
$ echo "Hello, 世界" | tr '[:lower:]' '[:upper:]'
Output: HELLO, 世界
$ echo "αγορί" | python -c "import sys; print(sys.stdin.read().upper(), end='')"
Output: ΑΓΟΡΊ
| 10
Examples of Unicode String Manipulation Functions Performed by Operating Systems(3/5):
echo -n "تجربة-بريد-الكتروني@تجربة-القبول-الشامل.موريتاني" | iconv -f UTF-8 -t UTF-16 | hexdump -C
00000000 ff fe 2a 06 2c 06 31 06 28 06 29 06 2d 00 28 06 |..*.,.1.(.).-.(.|
00000010 31 06 4a 06 2f 06 2d 00 27 06 44 06 43 06 2a 06 |1.J./.-.'.D.C.*.|
00000020 31 06 48 06 46 06 4a 06 40 00 2a 06 2c 06 31 06 |1.H.F.J.@.*.,.1.|
00000030 28 06 29 06 2d 00 27 06 44 06 42 06 28 06 48 06 |(.).-.'.D.B.(.H.|
00000040 44 06 2d 00 27 06 44 06 34 06 27 06 45 06 44 06 |D.-.'.D.4.'.E.D.|
00000050 2e 00 45 06 48 06 31 06 4a 06 2a 06 27 06 46 06 |..E.H.1.J.*.'.F.|
00000060 4a 06 20 00 |J. .|
00000064
| 11
Examples of Unicode String Manipulation Functions Performed by Operating Systems(4/5):
#!/bin/bash
string1="مرحبا"
string2="مرحبًا"
if [[ "$string1" = "$string2" ]]; then
echo "Equal"
else
echo "Not equal"
fi
$ echo "Hello, 世界" | grep -oP "\p{Han}"
| 12
Examples of Unicode String Manipulation Functions Performed by Operating Systems(5/5):
echo -n "é" | iconv -f UTF-8 -t UTF-8//IGNORE | hexdump -C
| 13
Operating Systems APIs or System Calls for Converting and Handling IDNs:
| 14
Examples on Handling IDNs in Linux:
sudo apt install idn2
idn2 "सार्वभौमिक-स्वीकृति-परीक्षण.संगठन"
idn2 -d "xn—lnfbb8fe3cvkui0de0bcg5hxagsg7d5lwail.xn--i1b6b1a6a2e"
| 15
EAI Compatibility of Basic Mail Command Line Utilities:
| 16
Operating Systems APIs or System Calls for Handling EAI in Email Clients:
| 17
File Systems Support for Unicode:
| 18
Example on Unicode Support in File Systems:
touch "日本語ファイル.txt"
mkdir "مجلد عربي"
cd "مجلد عربي"
cat "日本語ファイル.txt"
| 19
Working with Unicode: Case-sensitive vs Case-insensitive vs Case-insensitive but Case-preserving Filename Handling:
| 20
Reference:
[1]. The Unicode Consortium. https://unicode.org/consortium/ Accessed from https://home.unicode.org/ on December 20 2023.
[2]. Unicode Technical Reports. Accessed from https://www.unicode.org/reports/ on December 20, 2023.
[3]. Greenberg, J., & Sussman, M. (2019). Unicode explained. O'Reilly Media, Inc.
[4]. Linux Documentation Project. (n.d.). Unicode HOWTO. Retrieved from https://tldp.org/HOWTO/Unicode-HOWTO.html on December 20 2023.
[5]. FreeBSD Documentation. (n.d.). Unicode Support. Retrieved from https://www.freebsd.org/doc/handbook/unicode.html on December 20 2023.
[6]. Microsoft. (n.d.). Unicode in the Windows API. Retrieved from https://docs.microsoft.com/en-us/windows/win32/intl/unicode-in-the-windows-api on December 20 2023.
[7]. Davis, M., & Duerst, M. (2018). Unicode Technical Introduction. Unicode Consortium. Retrieved from https://www.unicode.org/standard/principles.html on December 20 2023.
| 21
Reference:
[8]. GNU Core Utilities. (n.d.). wc command. Retrieved from https://www.gnu.org/software/coreutils/manual/html_node/wc-invocation.html on December 20 2023.
[9]. GNU Core Utilities. (n.d.). cut command. Retrieved from https://www.gnu.org/software/coreutils/manual/html_node/cut-invocation.html on December 20 2023.
[10]. GNU Core Utilities. (n.d.). tr command. Retrieved from https://www.gnu.org/software/coreutils/manual/html_node/tr-invocation.html on December 20 2023.
[11]. Linux Documentation Project. (n.d.). Ext4 File System. Retrieved from https://www.kernel.org/doc/html/latest/filesystems/ext4/index.html on December 20 2023.
[12]. Microsoft. (n.d.). NTFS Technical Reference. Retrieved from https://docs.microsoft.com/en-us/windows/win32/fileio/ntfs-technical-reference on December 20 2023.
[13]. Apple Developer. (n.d.). Apple File System Guide. Retrieved from https://developer.apple.com/library/archive/documentation/FileManagement/Conceptual/APFS_Guide/Introduction/Introduction.html on December 20 2023.
| 22
Author:
| 23