Keynote Presentation: Unicode Myths
Mark Davis - IBM Centre for Java Technology SV
Intended Audience: |
Manager, Software Engineer, Systems Analyst, Marketer |
Session Level: |
Beginner, Intermediate, Advanced |
Much of what people know about Unicode is, in fact, not actually true. This paper
discusses some of the most common mistakes people make about Unicode, including:
- All characters in Unicode are in sorted order (or should be)
- Language information is required for correct use of Unicode
- Unicode is missing characters for (Lithuanian/Yoruba/Czech/...)
- Combining marks are not necessary: normalized text (NFC) does not contain them
- Unicode should have a "decimal point" character as well as a period.
- Case mappings are 1-1
- All compatibility characters are (good/bad: pick one)
- Every 16-bit Unicode value represents a character
- (UTF-8/UTF-16/UTF-32: pick one) is better than (UTF-8/UTF-16/UTF-32: pick one)
- You can use any unassigned codepoint for internal use.
|