A. The following table provides a list of areas where the
Unicode Consortium provides specifications, with a shorthand description
of each.
[MD]
| General |
|
Character Properties:
common properties such as Name, Alphabetic, Letter, White-Space, General Category, Default-Ignorable, plus those used in other specifications |
Ch 4 |
|
Character Properties for CJK Ideographs: property information specific to CJK ideographs and character properties |
UAX 38 |
|
Unicode Character Database: general documentation about the UCD |
UAX 44 |
|
UCD in XML: description of the XML representation of the UCD |
UAX 42 |
|
Case Operations: conversion/detection of Upper/Lower/Titlecase, case
folding, case matching. See also
4.2 Case. |
§ 3.13 |
|
Characters with Unusual Properties: characters that implementers need to pay special attention
to |
§ 4.11 |
| Use of Characters in Markup Contexts: guidelines for XML
and other markup languages |
UTR 20 |
| Script Names:
usage model for determining text runs
in a given script |
UAX 24 |
|
Use of Characters in Mathematical Contexts:
guidelines for
mathematical usage |
UTR 25 |
|
Unicode Named
Character Sequences:
specifies the syntax for named
character sequences |
UAX 34 |
| Encodings |
|
Unicode Encoding Forms: UTF-8, UTF-16, UTF-32 conversion and
validation |
§ 3.9 |
|
Unicode Encoding Schemes: UTF-8, UTF-16 (BE/LE), UTF-32 (BE/LE)
conversion and validation |
§ 3.10 |
|
Binary Order: UTF-8 order vs. UTF-16 order |
§ 5.17 |
| Character Mapping Markup Language: mapping Unicode to and from legacy code pages |
UTS 22 |
| A Standard Compression Scheme for Unicode: how to compress Unicode to about the same size as legacy |
UTS 6 |
| UTF-EBCDIC: encapsulating Unicode on EBCDIC systems |
UTR 16 |
| Compatibility Encoding Scheme for UTF-16: 8-Bit (CESU-8): a compatibility 8-bit encoding scheme |
UTR 26 |
|
Ideographic
Variation Database: repository of variation sequences for specified collections of Han glyphs |
UTS 37 |
| Comparison |
|
Canonical Equivalence: when character
sequences are equivalent; canonical
ordering |
§ 3.11 |
| Unicode Normalization Forms: how to normalize text for comparison |
UAX 15 |
| Unicode Collation Algorithm:
the default mechanism for comparing, searching, and matching
Unicode text |
UTS 10 |
| Parsing |
|
Hangul Syllables: boundaries, parsing, (de/)composition, names |
§ 3.12 |
|
Decimal Numbers: conversion and validation |
§ 5.5 |
| Unicode Regular Expression Guidelines: the features required in supporting regular expressions with Unicode |
UTS 18 |
| Identifier and Pattern Syntax:
how to parse identifiers. |
UAX 31 |
|
Language Information in Plain Text, also
16.9
Tag Characters |
§ 5.10 |
|
Variation Selectors: usage, validation |
§ 16.4 |
|
Ideographic Description Sequences: use, validation |
§ 12.2 |
| Segmentation |
|
Newline Guidelines: how to handle newline characters |
§ 5.8 |
|
Line Breaking Algorithm: the default way to determine where to linewrap |
UAX 14 |
|
Text Segmentation: the default way to break text into user characters, words, and sentences |
UAX 29 |
| Rendering |
| The Bidirectional Algorithm: required for display of Arabic and Hebrew text |
UAX 9 |
| East Asian Width: the default determination of character width
in East Asian contexts |
UAX 11 |
| Minimal shaping requirements for
Arabic,
Devanagari,
Tamil, etc. |
Ch 8-10 |
| Locale Data |
|
Locale Data
Mark-up Language (LDML): used for Interchange of locale
data used for internationalization |
UTS 35 |
|
Common Locale Data Repository (CLDR): a repository of
LDML data for hundreds of locales |
CLDR |
| Security |
|
Unicode
Security Considerations: guidelines for recognizing
Unicode security problems and dealing with them |
UTR 36 |
|
Unicode
Security Mechanisms: useful tools for detecting spoofs |
UTS 39 |