ISO-15924 script nodes and UAX#24 script IDs

From: Philippe Verdy (verdy_p@wanadoo.fr)
Date: Mon May 17 2004 - 13:19:28 CDT

  • Next message: E. Keown: "Re: Vertical BIDI"

    I know that now ISO15924 publishes 4-letter codes for scripts used in
    Bibliographic references and that it contains more scritps than in Unicode as
    ISO-15924 needs separate codes for variants that are unified for Unicode
    encoding.

    I also understand that Unicode defines its own "Hiragana_or_Katakana" code that
    is needed for character classification of a few characters (this specific code
    is not used as script codes for bibliographic reference, and that's a good
    justification for listing the "script" name in ISO-15924 between parentheses as
    this is a technical requirement, however ISO-15924 still accepted to encode it
    under N°=412, Code=Hrkt.)

    I also understand that Unicode also needs script IDs for "Common" and "Inherit".

    ISO-15924 also includes a "ID" column that should reflect the script ID used in
    Unicode character properties.

    BUT:

    I note these quirks:
    - The ISO-15924 text incorrectly references the IDs "Old Italic", "Linear B",
    "Canadian Aboriginal" with spaces, but the actual script IDs as defined in UAX
    #24 use underscores.
    - Unicode already defines three script ids that have no correspondance in
    ISO-15924: "Limbe", "Tai Le", Cypriot". Should there exist now a request to map
    these scripts with IDs in ISO-15924?
    - Isn't the Unicode script ID "Common" mapping to the ISO-15924 code "Zyyy"
    (N°=998) for undetermined script?
    - Should there exist a "Zwww" code in ISO 15924 for the Unicode "Inherited"
    script ID?

    Many ISO-15924 code exist that are candidate for encoding within Unicode with
    their own script ID to be defined later. For example these have been already
    discussed here:
        %N°;code;English name;nom français
        100;Mero;Meriotic;méroïtique
        115;Phnx;Phoenician;phénicien
        120;Tfng;Tifinagh (Berber);tifinagh (berbère)
        140;Mnda;Mandaean;mandéen //Is it same as Mende "Kikakui" Syllabic?
        282;Plrd;Pollard Phonétic;phonétique de Pollard
        300;Brah;Brahmi;brâhmî
        360;Java;Javanese;javanais
        365;Batk;Batak;batak
        ...
    and many others that are in the Unicode roadmap.
    For these scripts, does Unicode need to define its own script ID?
    Or shouldn't simply Unicode deprecate script IDs in favor of ISO-15924 codes?
    This may be important because UAX#24 is a normative reference in the W3C CSS3
    specification, and may be the existing Unicode IDs should become aliases of
    ISO-15924 codes.



    This archive was generated by hypermail 2.1.5 : Mon May 17 2004 - 13:19:53 CDT