From: Philippe Verdy (verdy_p@wanadoo.fr)
Date: Wed Nov 09 2005 - 01:16:15 CST
From: "Kenneth Whistler" <kenw@sybase.com>
>> U-nnnn already exists (or I should say, it has existed).
>
> U+nnnn, actually. The U- notation was introduced by Amd 9 to 10646
> in 1997. It was never adopted for any use with Unicode, per se.
I have no access to the text of ISO 10646. You're probably right there, but
it's a fact that there's always been confusion with the U-n...n notation,
whever it was standardized or not by ISO. This confusion lead to several
implementations using this notation to denote negative code units which are
not code points (or code positions if youprefer, with the ISO terminology).
Anyway, all the justification for introducing the 8-digit U-n..n notation in
ISO are now over, because ISO will not use them. There's no reason to speak
about maintaining it in any standard.
Only the U+[n][n]nnnn notation is unambiguous (and still better as symbolic
references than normative character names, as soon as they are encoded in
the standard with definitive code points).
The normative character names in ISO (and Unicode) only create constant
confusions, where these names should be maintainable for each actual
language (including English and French, the two normative languages for the
ISO standard): once these characters are definitively encoded, those
normative names should not remain normative except for the technical
ISO/Unicode language itself (which is not the normal English language).
That's why I militate for the classification of those names being not
representative for any humane language, but of a technical locale (similar
to the default "C" locale in POSIX apps, or the "root" locale in CLDR).
There should exist now actual translations for English (the "en" locale in
CLDR and POSIX) and French (the "fr" locale in CLDR and POSIX), to allow
corrections by substitution in those locales; the existing normative
pseudo-French names should go into a technical variant (like "fr-ISO" in
POSIX and CLDR), but not directly in the default "fr" locale, so the CLDR
would contain the complete list of corrected French names in "fr" if they
differ from the technical "root" locale using the normative pseudo-English
ISO names, and only the normative differences in the CLDR pseudo-French
"fr-ISO" or "C-fr" locales).
Why isn't there a project in CLDR to create such supplementary data for
translated character names (that won't be identifiers, the only identifiers
being the normative 4-to-6-digit hexadecimal code points)?
This archive was generated by hypermail 2.1.5 : Wed Nov 09 2005 - 01:17:50 CST