Universal Acceptance (UA) Micro-Learning Module: Module 2- Unicode Advanced Programming in Java.
Instructor Guide
1st Edition.
���© 2024 Creative Commons License - Attribution 4.0 International (CC BY 4.0).�
Universal Acceptance
Unicode Advanced Programming- Micro-Learning Module Objectives.
| 2
Note About the Utilization of Unicode String Literals :
| 3
Character-glyph Model
| 4
Examples How Character-Glyph Model is Utilized.
public class UnicodeStringIteration {
public static void main(String[] args) {
// Declare a Unicode string
String unicodeString = "Hello, 你好, नमस्ते"; // "你好" (nǐ hǎo)
is a Chinese greeting, and "नमस्ते" (namaste) is a greeting in Hindi.
// Convert the string to an array of characters
char[] charArray = unicodeString.toCharArray();
// Iterate through the characters in the string
for (char c : charArray) {
System.out.println(c);
}
}
}
| 5
Comparing Unicode Strings:
public class UnicodeComparison {
public static void main(String[] args) {
// Define two Unicode characters using character literals
char char1 = 'ሀ'; // the first character (ha) in the Ethiopic script
char char2 = 'ለ'; // the character "le" in the Ethiopic script
// Compare the characters directly
if (char1 < char2) {
System.out.println(char1 + " comes before " + char2);
} else {
System.out.println(char1 + " comes after " + char2);
}
}
}
| 6
Text Rendering: Engines, Fonts, and Glyph Shaper.
,
| 7
How do Glyph Shapers Handle the Shaping of Characters
with Diacritical Marks or Vowel Signs?
| 8
Normalization of Unicode strings - NFC and NFD(1/2)
| 9
Normalization of Unicode strings - NFC and NFD(2/2)
| 10
Unicode Text Normalization with NFC and NFD: Example Using Java.(1/3):
,
import java.text.Normalizer;
public class UnicodeNormalization {
public static void main(String[] args) {
// Example text
String text = "Café";
// Normalize to NFC
String nfcText = Normalizer.normalize(text, Normalizer.Form.NFC);
System.out.println("NFC normalized text: " + nfcText);
// Output: Individual Unicode Code Points for the NFC normalized text: Café
for (int i = 0; i < nfcText.length(); i++) {
char ch = nfcText.charAt(i);
if (ch != text.charAt(i)) {
System.out.println(ch + " U+" +
Integer.toHexString(ch | 0x10000).substring(1));
// NFC Outputs: diacritic character "é" is U+00E9.
}
}
| 11
Unicode Text Normalization with NFC and NFD: Example Using Java.(2/3):
,
// Normalize to NFD
String nfdText = Normalizer.normalize(text, Normalizer.Form.NFD);
System.out.println("NFD normalized text: " + nfdText);
// Output: Individual Unicode Code Points for the NFD normalized text: Café
for (int i = 0; i < nfdText.length(); i++) {
char ch = nfdText.charAt(i);
if (ch != text.charAt(i)) {
System.out.println(ch + " U+" +
Integer.toHexString(ch | 0x10000).substring(1));
// NFD Outputs: diacritic character: "e" (U+0065)
//followed by the combining character "´" (U+0301).
}
}
}
}
| 12
Unicode Text Normalization with NFC and NFD: Example Using Java (3/3):
| 13
Exploring the Unicode Character Database(UCD) for Better Text Processing:
,
| 14
How to Access the Unicode Character Database: Some of the
Methods and Tools:
,
| 15
Accessing UCD Using Programming Languages:
,
public class UCDExample2 {
public static void main(String[] args) {
// Retrieve character properties
char character = 'ሀ'; // Ethiopic Syllable Ha (ሀ)
int codePoint = character;
int category = Character.getType(codePoint);
System.out.println("Character: " + character);
System.out.println("Category: " + category);
//Output: Character: ሀ
//Output: Category: Lo
}
}
| 16
Accessing UCD Using Command Line Utilities and Complementary Command Line Utilities:
| 17
Accessing UCD Using Web Interfaces
| 18
Comparing Unicode Strings: Case Insensitive and Locale-Based Comparisons(½):
,
public class CaseInsensitiveComparison {
public static void main(String[] args) {
String string1 = "Café";
String string2 = "café";
if (string1.equalsIgnoreCase(string2)) {
System.out.println("The strings are equal (case-insensitive comparison)");
} else {
System.out.println("The strings are not equal (case-insensitive comparison)");
}
}
}
| 19
Comparing Unicode Strings: Case Insensitive and Locale-Based Comparisons(2/2):
,
import java.text.Collator;
import java.util.Locale;
Locale locale = Locale.US; // Example: English (United States)
// Create a Collator object with the specified locale
Collator collator = Collator.getInstance(locale);
// Define the strings to compare
String string1 = "café";
String string2 = "cafe";
// Perform a locale-based comparison
int result = collator.compare(string1, string2);
if (result < 0) {
System.out.println(string1 + " comes before " + string2 + " in the specified locale.");
} else if (result > 0) {
System.out.println(string1 + " comes after " + string2 + " in the specified locale.");
} else {
System.out.println(string1 + " and " + string2 + " are equivalent in the specified locale.");
}
| 20
Bidirectional Scripts and Shaped Scripts:
| 21
Bidirectional Display Format:
| 22
Reshaping Text Using the ICU Library:
,
import com.ibm.icu.text.Bidi;
public class ReshapeArabicText {
public static String reshapeArabicText(String text) {
Bidi bidi = new Bidi(text, Bidi.DIRECTION_DEFAULT_LEFT_TO_RIGHT);
return bidi.writeReordered(Bidi.REORDER_DEFAULT);
}
public static void main(String[] args) {
String text = "مرحبا بكم"; // 'Marhaban bikum' in Arabic
String reshapedText = reshapeArabicText(text);
System.out.println(reshapedText);
}
}
| 23
File Storage in Key-press Order in Bidirectional and Shaped Scripts:
| 24
Glyph Shapers in Bidirectional and Shaped Scripts:
| 25
ICU Examples for Complex Script Shaping and Rendering Using Java:
import com.ibm.icu.text.ArabicShaping;
import com.ibm.icu.text.Bidi;
public class ArabicTextShaping {
public static void main(String[] args) {
// Create an ICU Arabic text shaping object
ArabicShaping arabicShaper = new ArabicShaping();
// Define the Arabic text to shape and render
String arabicText = "السلام عليكم"; // 'Assalamu alaikum' in Arabic
// Shape the Arabic text
String shapedText = arabicShaper.shape(arabicText);
// Render the shaped text using ICU's BiDi (Bi-Directional) algorithm
Bidi bidi = new Bidi(shapedText, Bidi.DIRECTION_LEFT_TO_RIGHT);
// Display the rendered text
System.out.println(bidi.writeReordered(Bidi.REORDER_DEFAULT));
}
}
| 26
Unicode in Other File Formats and their Handling -
JSON File Unicode Handling:
| 27
Unicode in JSON File Format using Java(1/3):
import java.io.FileWriter;
import java.io.FileReader;
import java.io.IOException;
import org.json.simple.JSONObject;
import org.json.simple.parser.JSONParser;
import org.json.simple.parser.ParseException;
public class ContactInfo {
public static void main(String[] args) {
JSONObject contactInfo = new JSONObject();
// Define the contact information
JSONObject name = new JSONObject();
name.put("Ethiopic", "ዮናስ ተስፋዬ");
name.put("Arabic", "يونس تسفاي");
name.put("Sinhala", "යුනෝස් ටෙස්ෆායි");
name.put("Japanese", "ユナス・テスファイ");
name.put("Chinese", "尤纳斯·特斯法伊");
name.put("Latin", "Yonas Tesfaye");
contactInfo.put("name", name);
J
| 28
Unicode in JSON File Format using Java(2/3):
JSONObject emailAddress = new JSONObject();
emailAddress.put("Ethiopic", "ኢሜይል-ሙከራ@ሁለንአቀፍ-ተቀባይነት-ሙከራ.com");
emailAddress.put("Arabic", "تجربة-بريد-الكتروني@تجربة-القبول-الشامل.موريتانيا");
emailAddress.put("Sinhala", "ඉ-තැපැල්-පිරික්සුම@විශ්ව-සම්මුති-පිරික්සුම.ලංකා");
emailAddress.put("Japanese", "ユナス・テスファイ@ユナス・テスファイ");
emailAddress.put("Chinese", "mailto:電子郵件測試@普遍適用測試.台灣");
emailAddress.put("Latin", "yonas.tesfaye@domain.com");
contactInfo.put("email_address", emailAddress);
JSONObject jobTitle = new JSONObject();
jobTitle.put("Ethiopic", "ሶፍትዌር አልሚ");
jobTitle.put("Arabic", "مهندس برمجيات");
jobTitle.put("Sinhala", "වෘත්තීය ගැටළුවක්");
jobTitle.put("Japanese", "ソフトウェアエンジニア");
jobTitle.put("Chinese", "软件工程师");
jobTitle.put("Latin", "Software Engineer");
contactInfo.put("job_title", jobTitle);
| 29
Unicode in JSON File Format using Java(3/3):
// Saving data to a JSON file named contactinfo.json
try (FileWriter file = new FileWriter("contactinfo.json")) {
file.write(contactInfo.toJSONString());
System.out.println("Successfully wrote JSON object to file.");
} catch (IOException e) {
e.printStackTrace();
}
// Opening the file named "contactinfo.json" and accessing the loaded data
try (FileReader reader = new FileReader("contactinfo.json")) {
JSONParser jsonParser = new JSONParser();
JSONObject loadedData = (JSONObject) jsonParser.parse(reader);
// Accessing and printing the loaded data
System.out.println("Name: " + loadedData.get("name"));
System.out.println("Email Address: " + loadedData.get("email_address"));
System.out.println("Job Title: " + loadedData.get("job_title"));
} catch (IOException | ParseException e) {
e.printStackTrace();
}
}
}
| 30
Unicode String Manipulations Using Programming Language Specific Libraries and ICU: Java Code Snippet.
import java.util.Iterator;
String text = "こんにちは世界"; // Japanese greeting "Hello, World"
// Character count
int characterCount = text.codePointCount(0, text.length());
System.out.println("Character count: " + characterCount);
// Character iteration and properties
Iterator<Integer> codePointIterator = text.codePoints().iterator();
while (codePointIterator.hasNext()) {
int codePoint = codePointIterator.next();
System.out.println("Character: " + (char) codePoint);
System.out.println("Character code point: " + codePoint);
System.out.println("Character name: " + Character.getName(codePoint));
System.out.println("--------------------");
}
| 31
Java - Using the java.text.Normalizer Class for Unicode Normalization:
import java.text.Normalizer;
public class UnicodeStringManipulationJava {
public static void main(String[] args) {
String input = "Café";
// Normalize the string to NFC form
String normalized = Normalizer.normalize(input, Normalizer.Form.NFC);
// Remove diacritical marks
String withoutDiacritics = normalized.replaceAll("\\p{M}", "");
System.out.println("Normalized: " + normalized);
System.out.println("Without Diacritics: " + withoutDiacritics);
}
}
| 32
Java - Using ICU (icu4j) for Unicode Normalization and Case Folding:
import com.ibm.icu.text.Normalizer2;
import com.ibm.icu.text.Transliterator;
public class UnicodeStringManipulationICUJava {
public static void main(String[] args) {
String input = "Café";
// Normalize the string to NFC form using ICU
Normalizer2 normalizer = Normalizer2.getNFCInstance();
String normalized = normalizer.normalize(input);
// Remove diacritical marks using ICU
Transliterator diacriticRemover = Transliterator.getInstance("Any-NFD; [:M:] Remove; NFC");
String withoutDiacritics = diacriticRemover.transform(normalized);
System.out.println("Normalized: " + normalized);
System.out.println("Without Diacritics: " + withoutDiacritics);
}
| 33
ICU Library in Java for Text Collation:
import com.ibm.icu.text.Collator;
import com.ibm.icu.util.ULocale;
public class StringComparison {
public static String compareStrings(String text1, String text2, String locale) {
Collator collator = Collator.getInstance(new ULocale(locale));
int result = collator.compare(text1, text2);
if (result < 0) {
return text1 + " comes before " + text2;
} else if (result > 0) {
return text1 + " comes after " + text2;
} else {
return text1 + " is equal to " + text2;
}
}
public static void main(String[] args) {
String string1 = "تفاحة"; // 'tuffaha' in Arabic, Apple in English
String string2 = "موز"; // 'mawz' in Arabic, Banana in English
String comparison = compareStrings(string1, string2, "ar");
System.out.println("Comparison Result: " + comparison);
}
}
| 34
Reference:
| 35
Author:
| 36