Re: ISO 8859-11 (Thai) cross-mapping table

From: Kenneth Whistler (kenw@sybase.com)
Date: Mon Oct 07 2002 - 20:15:09 EDT

  • Next message: Doug Ewell: "Re: [ANN] World Address Project starts and relies on Unicode heavily"

    Elliotte Harold asked:

    > The Unicode data files at
    > http://www.unicode.org/Public/MAPPINGS/ISO8859/ do not include a mapping
    > for ISO-8859-11, Thai. Is there any particular reason for this?

    Just that nobody got around to submitting and posting one.

    Since there was a lot of discussion about this over the weekend,
    I took it upon myself to create and post one in the same format
    as the other ISO8859 tables.

    Let me know if anybody spots any problems in the table -- but
    it really is pretty straightforward, as others noted: TIS 620-2533 (1990)
    with one addition: 0xA0 NO-BREAK SPACE.

    Doug dug out:

    > These 9 code positions (0xA0, 0xDB..0xDE, 0xFC..0xFF) appear to be
    > undefined in TIS 620.2533. Reference [3] below does show a "word
    > separator character" at 0xDC, which I interpret as U+200B ZERO WIDTH
    > SPACE, but the other positions are still undefined.

    Reference [3] is online Tru64 Unix documentation about its Thai support,
    which claims that:

    "- No-Break space. The character code is A0.
     ...
     - Word separator. The word separator defined in TIS 620-2533."

    This despite the fact that the table shown has no no-break space
    shown at A0 (and TIS 620-2533 (1990) does not have it), and that
    0xDC is undefined in TIS 620-2533, despite the fact that the
    table in the Tru64 Unix documentation shows "word sep." there.
    The table is labelled the "TACTIS Codeset" for "Thai API Consortium/
    Thai Industrial Standard." I surmise that this is some vendor
    extension to the actual TIS 620-2533 (1990). The actual standard
    states clearly (in Thai) that 0x80..0xA0, 0xDB..0xDE, and 0xFC..0xFF
    are reserved (unassigned), and the tables in the standard match that.

    So there may be some implementation practice that uses 0xDC for
    U+200B ZERO WIDTH SPACE in Thai code pages, but that is not
    part of either TIS 620-2533 (1990) nor ISO 8859-11:2001.

    --Ken



    This archive was generated by hypermail 2.1.5 : Mon Oct 07 2002 - 21:06:52 EDT