Re: Is there Unicode mail out there?

From: Tex Texin (texin@progress.com)
Date: Fri Jul 13 2001 - 11:07:35 EDT


Doug,
I thought I had acknowledged the rationale for supporting labeling
the message with the
minimal charset based on each message's contents in the beginning
of the third paragraph, but maybe I should have expanded on it.
Anyway, despite the benefit it is a significant problem that
it is unreliable and that "past performance does not
predict future performance" or whatever the phrase is that the
financial markets use.

I was mostly stage setting for the idea that there should be a
clear indicator for a failed character conversion. The last resort
proposal is ok. I agree with you about seeing the hex value for the
missing
character with the symbol. (I've already been forced to learn the
unicode codepoint for the Euro by heart... I would probably
recognize
most of the commonly failed characters if the code points were
available.) Maybe writing the value
as an HTML numeric character reference (e.g. €) would also
make it easier for processes reading files saved by the mailer
to recover the character. (By using a "standard representation" and
also one that is not likely to appear in an email, unless the email
is
about character references...)
For the unicode-unaware the syntax could allow inclusion of the
original
code page label: &#X0080:windows1256;

Anyway, this problem that characters that do not convert in mails
are not being clearly indicated:
        occurs frequently,
        can have significant impact to users,
        seems to have some cheap workarounds,
that are better than either just relabeling to the lowest common
denominator or
preventing communications entirely.

tex

DougEwell2@cs.com wrote:
>
> In a message dated 2001-07-12 8:55:07 Pacific Daylight Time,
> texin@progress.com writes:
>
> > So the proposal is that minimizing the charset is a good thing?
> >
> > This means that you and I start out in a conversation about a
> > product I am trying to sell you, it happens to be all in ascii
> > and we exchange several mails successfully. Then I quote you
> > a price in Euros and my 1252 message gets corrupted by your
> > reader which can handle either only 8859-1 or ASCII, and
> > you miss the fact that the Euro is corrupted and think we
> > are talking dollars or some other currency.
> >
> > Although I understand why you would want a minimal charset in order
> > to not needlessly prevent communications, the implication of
> > reliability and trust that is built by having some success is
> > a problem. You think you are communicating successfully but when it
> > is critical it may not...
>
> The premise seems to be that we should reject, or at least issue a warning
> against, the earlier messages on the basis that the sender *might* be able to
> send characters in the future that the receiver could not receive. Sorry,
> but I can't buy into that. That would prevent the CP1252 user from ever
> being able to communicate adequately with anyone who has "only" ISO 8859-1.
>
> What if I am trying to exchange mail with a user of Windows-1256? Lots of
> roadblocks would be erected because of the chance that the guy *might* send
> me ARABIC LETTER ALEF WITH HAMZA BELOW and I couldn't interpret it. And I
> couldn't exchange mail with UTF-8 users either, because of that YI SYLLABLE
> BBOP they might send me some day.
>
> > Perhaps if a harder line was taken when characters
> > are used that cannot be converted, this would make more sense.
> > (ie give a very clear recognizable indication of corruption or
> > conversion failures)
>
> That's reasonable. Simply replacing unknown characters with '?' doesn't
> work; the character is too easily overlooked. I would like to see mailers
> replace unsupported characters with a Unicode representation like "[U+A068]".
> (That would certainly help with this spate of CJK characters that people are
> sending lately on the Unicode list!) I suspect that's too much Unicode
> awareness to ask of an otherwise Unicode-unaware product, though.
>
> -Doug Ewell
> Fullerton, California

-- 
---------------------------------------------------------------
Tex Texin                      Director, International Business
mailto:Texin@Progress.com      +1-781-280-4271
Fax:+1-781-280-4655
the Progress Company           14 Oak Park, Bedford, MA 01730
---------------------------------------------------------------



This archive was generated by hypermail 2.1.2 : Fri Jul 13 2001 - 12:24:44 EDT