Re: ISO 15924: zh-Hani for general Chinese (was: Different Arabic scripts?)

From: Doug Ewell (
Date: Fri Nov 25 2005 - 19:33:42 CST

  • Next message: Tom Emerson: "Re: ISO 15924: zh-Hani for general Chinese (was: Different Arabic scripts?)"

    Getting a little off-topic for Unicode here...

    Philippe Verdy <verdy underscore p at wanadoo dot fr> wrote:

    > In a locale, what differences does it make between "zh" (any Chinese
    > language) and "zh-Hani" (any Han script) ? Except if one expects a
    > difference for "zh-Latn" (Pinyin) or "zh-Bopo" (Bopomofo), it is
    > unlikely that a resource localized for "zh" would use something else
    > than a Han orthography, the alternatives being encoded separately for
    > special local use.

    You've just discovered the premise behind Suppress-Script, an attribute
    of language subtags developed for the forthcoming RFC 3066bis.

    Certain languages are written much more commonly with one particular
    script than with any other, to the extent that specifying that script in
    a language tag would be pointless. Examples might include French in
    Latin script, Arabic in Arabic script, or Chinese in "Han" (simplified
    vs. traditional unspecified). For some of those languages, the RFC
    3066bis registry will include a Suppress-Script entry, indicating that
    the use of that script subtag with that language subtag is discouraged,
    though not forbidden.

    For example:

    Type: language
    Subtag: fr
    Description: French
    Added: 2005-10-16
    Suppress-Script: Latn

    This entry means that in most circumstances, the tag "fr-Latn" conveys
    no additional information over simply "fr", and therefore the script
    subtag "Latn" should be suppressed. Languages such as Serbian, for
    which there is no overwhelming "majority" script, don't have a
    Suppress-Script entry; a script subtag usually does add some information
    in these cases.

    Not all languages that are predominantly written in a particular script
    have been assigned a Suppress-Script entry. There are provisions in RFC
    3066bis to register Suppress-Script information for additional
    languages. The review process must be undertaken carefully to ensure
    adequate expertise and a lack of political motivation (e.g. someone
    trying to define Latin as the "default" script for Serbian)

    The tags "zh-Hans" and "zh-Hant" certainly can add information as
    compared to "zh" alone. Even "zh-Hani" might be an improvement over
    "zh" in contexts where non-Han transcriptions (such as, but not limited
    to, Pinyin) might be expected.

    Transcription systems in general have been suggested as a reasonable use
    for variant subtags in RFC 3066bis. It's not a good idea to infer a
    particular transcription given only the script. "Korean in Latin
    script," for example, could be McCune-Reischauer, Yale, Revised
    Romanization, or even something else.

    Further discussion on this topic should be carried out on, not on the Unicode list.

    Doug Ewell
    Fullerton, California, USA

    This archive was generated by hypermail 2.1.5 : Fri Nov 25 2005 - 19:36:04 CST