Re: MCW encoding of Hebrew (was RE: Response to Ever son Ph and why Jun 7? fervor)

From: Philippe Verdy (verdy_p@wanadoo.fr)
Date: Mon May 24 2004 - 20:10:48 CDT

Next message: Michael Everson: "Re: Classification; Phoenician"

Previous message: Dean Snyder: "Re: Classification; Phoenician"
In reply to: Mike Ayers: "RE: [BULK] - Re: MCW encoding of Hebrew (was RE: Response to Ever son Ph and why Jun 7? fervor)"
Next in thread: Doug Ewell: "VISCII (was: Re: [BULK] - Re: MCW encoding of Hebrew)"
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ] [ attachment ]
Mail actions: [ respond to this message ] [ mail a new topic ]

RE: [BULK] - Re: MCW encoding of Hebrew (was RE: Response to Everson Ph and why
Jun 7? fervor)From: Mike Ayers
> Another such code is VISCII for Vietnamese.

Recte: VISCII does not claim to be ASCII. It claims be be a separate 8-bit
encoding, which includes the US-ASCII printable charset, but is not compatible
with ASCII as it replaces some C0 controls by Latin characters... breaking the
conformance model for ISO 646.

So the MCW representation of Hebrew letters with 7-bit codes that can fit in
systems made to transport or store safely only ASCII is a charset under the IANA
definition: i.e. the association of a character repertoire (or Unicode subset),
and encoding that assigns a unique numeric code to the characters, and a
serialization syntax which maps these codes into streams of bytes (here a simple
identity function).

The fact that it is or is not registered on IANA as a "charset" usable for
interchange (for example in MIME content-types) does not change its status: this
MCW encoding (as well as VISCII) is definitely *NOT* ASCII (i.e. ISO 646-US) and
it does not comply to ISO 646 encoding rules (which *require* mapping the
invariable subset with no other interpretation as Basic Latin letters digits and
punctuations)!

One prrof is the encoding of alef as a left parethensis: it breaks the use of
paired parentheses, will prevent using parentheses in Hebrew, will not allow
putting negative numbers in parentheses; also it will give wrong results if case
mapping is performed legitimately as if it was ASCII (breaking with
case-insensitive searches).
Any MCW-encoded text exposed as if it was ASCII will become exposed to lots of
interoperability problems, *unless* the text is correctly tagged as using
another charset than ASCII.

The fact that this is private should not be a limit. For example a MCW-encoded
text could be transported with the following MIME content-type: text/plain;
charset=x-MCW
under the following Content-Transfer-Syntax: 7-bit
or with other transforms (Base64, Quoted-Printable...) or compressions
(deflate...)
There are much enough options in Emails to allow transporting private encodings
safely, without claiming to be ASCII when it is not.

Next message: Michael Everson: "Re: Classification; Phoenician"
Previous message: Dean Snyder: "Re: Classification; Phoenician"
In reply to: Mike Ayers: "RE: [BULK] - Re: MCW encoding of Hebrew (was RE: Response to Ever son Ph and why Jun 7? fervor)"
Next in thread: Doug Ewell: "VISCII (was: Re: [BULK] - Re: MCW encoding of Hebrew)"
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ] [ attachment ]
Mail actions: [ respond to this message ] [ mail a new topic ]

This archive was generated by hypermail 2.1.5 : Mon May 24 2004 - 20:11:09 CDT