Unclear text in the UBA (UAX#9) of Unicode 6.3
verdy_p at wanadoo.fr
Fri May 2 09:57:36 CDT 2014
The email was sent from Gmail on its webmail, French edition.
May be Gmail is causing this, this is not expected and I don't know why
Gmail transforms the text to ISO 8859-1 (without breaking the text without
notice, it could had used windows-1252, which has completely superseded ISO
8859-1 along with HTML5).
But the HTML part was intact and that's the HTML part that I see (I almost
never look at the generated plain text part which always has caveats if it
is not sent with UTF-8).
In my opinion it's a bad choice of Gmail for replacing guillemets by ugly
pairs of ASCII symbols (not even used in French contexts) which are also
confusing the conventional notation of citations, in fact if it really
wanted to use ISO 88598-1, it should have used the "ASCII double quotes".
Is « 20 °C » OK with the degree symbol? So the guillemets should be OK too
and I don't think this makes emails less readable for the recipients
reading only the plain text part in their old email agents. I just hope
that Gmail does not mess things worse by using UTF-8 and still making these
ugly substitutions for characters that are widely supported since about 30
years on so many system.
And I wonder then why Gmail offers immediate support for HTML composition
and a tool to remove the non-plain text formatting that still preserves the
UTF-8 encoding if that's for modifying what we write.
But Gmail should better send UTF-8 plain text parts instead of replacing
any character that does not match in its default legacy 8-bit encoding.
Notably because it does not offer any option to select the encoding that
will be used (either in HTML or in the plain-text version).
If you just read the plain text part, you know that it is lossy, so you can
get random replacements for many characters in lots of scripts (even
non-Latin ones), and symbols if it's not sent with UTF-8.
So this is not embarassing for the Unicode mailing list, it is embarassing
for Google leaving their users aware of what it will perform silently.
And sorry I no longer use any standalone mail agent, I prefer using web
storage without loosing all my emails when I use another device or install
a new OS.
2014-05-02 0:44 GMT+02:00 Richard Wordingham <
richard.wordingham at ntlworld.com>:
> On Thu, 24 Apr 2014 17:19:57 -0700
> Asmus Freytag <asmusf at ix.netcom.com> wrote:
> > On this side show, Philippe finally is correct, because I received
> > his message without ASCII-i-fication; he cc'd me directly, and I
> > never saw the mangled text. It's a bit embarassing for a Unicode mail
> > list to not even be able to let guillemets through unmolested.
> Are you sure it's the mail list that did the mangling? As I got the
> post, it had two parts, plain-text in ISO-8859-1, with '<<' and '>>'
> substituted for the guillemets '«' and '»', and HTML, also in
> ISO-8859-1, with character entities « and ». I suspect
> Philippe's e-mail client may be at fault.
-------------- next part --------------
An HTML attachment was scrubbed...
More information about the Unicode