U+nnnn notation and normative identifiers.

From: Philippe Verdy (verdy_p@wanadoo.fr)
Date: Wed Nov 09 2005 - 01:16:15 CST

  • Next message: Johannes Bergerhausen: "Re: Origin of the U+nnnn notation"

    From: "Kenneth Whistler" <kenw@sybase.com>
    >> U-nnnn already exists (or I should say, it has existed).
    >
    > U+nnnn, actually. The U- notation was introduced by Amd 9 to 10646
    > in 1997. It was never adopted for any use with Unicode, per se.

    I have no access to the text of ISO 10646. You're probably right there, but
    it's a fact that there's always been confusion with the U-n...n notation,
    whever it was standardized or not by ISO. This confusion lead to several
    implementations using this notation to denote negative code units which are
    not code points (or code positions if youprefer, with the ISO terminology).

    Anyway, all the justification for introducing the 8-digit U-n..n notation in
    ISO are now over, because ISO will not use them. There's no reason to speak
    about maintaining it in any standard.
    Only the U+[n][n]nnnn notation is unambiguous (and still better as symbolic
    references than normative character names, as soon as they are encoded in
    the standard with definitive code points).

    The normative character names in ISO (and Unicode) only create constant
    confusions, where these names should be maintainable for each actual
    language (including English and French, the two normative languages for the
    ISO standard): once these characters are definitively encoded, those
    normative names should not remain normative except for the technical
    ISO/Unicode language itself (which is not the normal English language).

    That's why I militate for the classification of those names being not
    representative for any humane language, but of a technical locale (similar
    to the default "C" locale in POSIX apps, or the "root" locale in CLDR).
    There should exist now actual translations for English (the "en" locale in
    CLDR and POSIX) and French (the "fr" locale in CLDR and POSIX), to allow
    corrections by substitution in those locales; the existing normative
    pseudo-French names should go into a technical variant (like "fr-ISO" in
    POSIX and CLDR), but not directly in the default "fr" locale, so the CLDR
    would contain the complete list of corrected French names in "fr" if they
    differ from the technical "root" locale using the normative pseudo-English
    ISO names, and only the normative differences in the CLDR pseudo-French
    "fr-ISO" or "C-fr" locales).

    Why isn't there a project in CLDR to create such supplementary data for
    translated character names (that won't be identifiers, the only identifiers
    being the normative 4-to-6-digit hexadecimal code points)?



    This archive was generated by hypermail 2.1.5 : Wed Nov 09 2005 - 01:17:50 CST