#
# FORMAT
#
# Each line has three fields, as
described here:
#
# First field: Code point
# Second field: Alias
# Third field: Type
#
# The Type labels used are:
correction, control, alternate, figment, abbreviation
#
# Those Type labels can be
mapped to other strings for display, if desired.
But those Type values are not documented in either of the
mentioned files, or in the header. I suggest something simple in
the header like the following:
correction - A corrected name for UIs where the formal Unicode name
is mistaken or misleading in one way or another. For a given code
point, there is at most one value with Type=correction.
control - The most
commonly used names for a control code. (For historical reasons, the
control codes don't have formal Unicode names.) For a given code
point, there may be multiple values with Type=control (such
as U+008D,
with "REVERSE
LINE FEED" and "REVERSE INDEX".
[Question: If one
of the aliases is more commonly used, then it is listed first? That
would be useful...]
alternate - An alternate name. For a given code point, there may be
multiple values with Type=alternate.
figment - ??? [Ken would have to explain this.]
abbreviation - A common abbreviation for the character name or
control code name. For a given code point, there may be multiple
values with Type=abbreviation.
NOTE:
In
http://www.unicode.org/reports/tr18/#Name_Properties
we recommend that Regex Expressions support both the formal names
and the Name_Aliases. However, the 'figment' above looks suspicious
- is it something that we should not recommend people match? Hard to
tell without knowing what it means...
====
FYI: in U6.3, there are the following counts in NameAliases.txt:
352 |
abbreviation |
84 |
control |
17 |
correction |
3 |
figment |
1 |
alternate |
There is only one non-control-character with more than one alias: