Re: MCW encoding of Hebrew (was RE: Response to Ever son Ph and why Jun 7? fervor)

From: Philippe Verdy (verdy_p@wanadoo.fr)
Date: Mon May 24 2004 - 20:10:48 CDT

  • Next message: Michael Everson: "Re: Classification; Phoenician"

    RE: [BULK] - Re: MCW encoding of Hebrew (was RE: Response to Everson Ph and why
    Jun 7? fervor)From: Mike Ayers
    > Another such code is VISCII for Vietnamese.

    Recte: VISCII does not claim to be ASCII. It claims be be a separate 8-bit
    encoding, which includes the US-ASCII printable charset, but is not compatible
    with ASCII as it replaces some C0 controls by Latin characters... breaking the
    conformance model for ISO 646.

    So the MCW representation of Hebrew letters with 7-bit codes that can fit in
    systems made to transport or store safely only ASCII is a charset under the IANA
    definition: i.e. the association of a character repertoire (or Unicode subset),
    and encoding that assigns a unique numeric code to the characters, and a
    serialization syntax which maps these codes into streams of bytes (here a simple
    identity function).

    The fact that it is or is not registered on IANA as a "charset" usable for
    interchange (for example in MIME content-types) does not change its status: this
    MCW encoding (as well as VISCII) is definitely *NOT* ASCII (i.e. ISO 646-US) and
    it does not comply to ISO 646 encoding rules (which *require* mapping the
    invariable subset with no other interpretation as Basic Latin letters digits and
    punctuations)!

    One prrof is the encoding of alef as a left parethensis: it breaks the use of
    paired parentheses, will prevent using parentheses in Hebrew, will not allow
    putting negative numbers in parentheses; also it will give wrong results if case
    mapping is performed legitimately as if it was ASCII (breaking with
    case-insensitive searches).
    Any MCW-encoded text exposed as if it was ASCII will become exposed to lots of
    interoperability problems, *unless* the text is correctly tagged as using
    another charset than ASCII.

    The fact that this is private should not be a limit. For example a MCW-encoded
    text could be transported with the following MIME content-type: text/plain;
    charset=x-MCW
    under the following Content-Transfer-Syntax: 7-bit
    or with other transforms (Base64, Quoted-Printable...) or compressions
    (deflate...)
    There are much enough options in Emails to allow transporting private encodings
    safely, without claiming to be ASCII when it is not.



    This archive was generated by hypermail 2.1.5 : Mon May 24 2004 - 20:11:09 CDT