Re: [OT] multilingual support in MS products (was Re: Kurdish ghayn)

From: Doug Ewell (dewell@adelphia.net)
Date: Mon Apr 28 2003 - 02:39:59 EDT

  • Next message: Marco Cimarosti: "RE: Languages A-Z"

    John Hudson <tiro at tiro dot com> wrote:

    >> No Dutchman - whether he is involved in type or not - can be amazed
    >> by the existence of IJ.
    >
    > No one is amazed that it exists as a grapheme, but my Dutch colleagues
    > are frequently surprised to discover that it is a *character* in
    > Unicode, and they wonder why. Perhaps this is one of those characters
    > that needs its story told: I've heard that it was encoded for
    > backwards compatibility with an existing standard, but no one I've
    > asked seems to know which standard, or whether this standard is still
    > in use by anyone.

    The standard is ISO/IEC 6937.

    First developed in the early 1980s, this was a supplementary set of 96
    code points intended for use in conjunction with ISO 646 (ASCII) to
    cover as many European languages as possible, within the ISO 2022
    framework. It featured a set of non-spacing diacritical marks, the
    forerunners of Unicode's combining marks, although they appeared before
    the base letter instead of after it as in Unicode, and were not
    considered characters in their own right. About 330 characters could be
    encoded when all the combining marks were taken into account.

    ISO 6937 had some significant drawbacks that prevented its widespread
    deployment, at least in North America. The combining marks could only
    be used in certain prescribed combinations (a with acute was legal but g
    with acute was not), and only one combining mark per base letter was
    allowed, which made ISO 6937 useless for languages like Vietnamese that
    require multiple diacritics. Furthermore, because it lived in the ISO
    2022 world, ISO 6937 had to be "announced" via an escape sequence. And
    of course, there was the usual resistance to encoding a single character
    like รก with a two-byte sequence. ISO 6937 never achieved great
    popularity, although I have heard it saw some use in the Netherlands.

    The capital IJ digraph is encoded at position 6/6 in ISO 6937, which
    means it would normally be expressed with byte 0xE6 (assuming 6937 was
    defined as the G1 or "high-bit" character set). The small ij digraph is
    encoded at 7/6 (0xF6).

    -Doug Ewell
     Fullerton, California
     http://users.adelphia.net/~dewell/



    This archive was generated by hypermail 2.1.5 : Mon Apr 28 2003 - 03:26:57 EDT