RE: browser encoding settings

From: Dean Harding (dean.harding@dload.com.au)
Date: Wed Jun 01 2005 - 02:29:48 CDT

  • Next message: Jon Hanna: "Re: browser encoding settings"

    I've been trying this out myself today, and from what I can gather hotmail,
    yahoo and gmail all accept utf-8 quoted-printable mails (as in, they appear
    to decode them properly at least). The problem with both hotmail and yahoo
    is that they report to the browser that the webpage is encoded as ISO
    8859-1, so any UTF-8 characters will be garbage. Gmail, on the other hand,
    is OK since it reports an encoding of UTF-8 to the browser, so any
    non-US-ASCII characters look right on gmail.

    Perhaps the POP3 interface to Hotmail/Yahoo will work properly, but I'm
    haven't bothered to sign up for it to test :)

    Faraz, my suggestion to you is that you continue sending emails with the
    UTF-8 charset and quoted-printable (since this seems the most
    widely-supported combination) and that you also recommend your users to NOT
    use Hotmail or Yahoo to view Urdu emails. I don't believe there's anything
    you can do that will get them to display properly, without them having to
    manually change the page's encoding to UTF-8. You can get your users to
    sign up for gmail (it's still invite-only, but you can always forward
    yourself an invite via http://isnoop.net/gmail/), since gmail works fine.

    Dean.

    -----Original Message-----
    From: unicode-bounce@unicode.org [mailto:unicode-bounce@unicode.org] On
    Behalf Of Philippe Verdy
    Sent: Wednesday, 1 June 2005 3:12 am
    To: paul@sustainableGIS.com; unicode@unicode.org
    Subject: Re: browser encoding settings

    True also in France: some servers were initially configured to accept emails

    using 8-bit encodings, and later they were reverted to only accept 7-bit
    encodings.

    Many mail servers in Japan only support 7-bit emails, because they were
    tweaked locally since long to support Shift-JIS, and not reconfigured later
    to support other 8-bit encodings with something else than the ugly MIME
    7-bit transfer encoding syntaxes.

    As for Indian charsets, there's no other better supported encoding (ISCII is

    rarely supported in most browsers, mail agents or webmail servers), the only

    choice that remains is then UTF-7.

    But some webmails servers also do not support UTF-7, but only UTF-8, so
    users reading their emails online will be disapointed to see messages
    bargles with unreadable sequences like "+AO7-"... I think that Urdu readers
    need to use POP3 email agents, or choose a webmail service that do support
    the decoding of UTF-7 (in addition to UTF-8).

    The alternative then is to use UTF-8 with a MIME 7-bit transfer encoding
    syntax (quoted printable). Note that Base64 would probably be more
    efficient, but some mail servers reject all Base64-encoded emails, because
    they think they only contain binary attachments which are thought to be
    undesirable for security (simply because Base64 is used most often for those

    binary attachments).

    If I had to send Urdu emails, I would choose UTF-8 with Quoted-Printable...
    Ugly because this is an inefficient encoding (so emails are larger), but at
    least it works on most platforms. Now the recipients need a browser or email

    agent capable of displaying Urdu texts (this is a separate issue: if your
    email is in Urdu, you can expect that users capable of reading this language

    have set up an environment with fonts and renderers suitable for the
    extended Arabic script, and Bidi rendering).

    A more efficient encoding would use the ISO-8859 Latin+Arabic charset also
    with Quoted-Printable (but here again, the Latin-Arabic charset is not
    commonly supported by many webmail agents).

    BiDi text rendering is also an issue: if your email is plain text, not all
    email agents will render it properly (and BiDi override controls defined in
    Unicode are too much often ignored in many console applications, as they
    have no equivalent in legacy Arabic charsets). If you use HTML instead, you
    could alternatively use a "visual" encoding order for characters, using the
    <BDO> HTML override. This will complicate the composition of your email
    text...

    ----- Original Message -----
    From: "Paul Hastings" <paul@sustainableGIS.com>
    To: <unicode@unicode.org>
    Sent: Tuesday, May 31, 2005 7:36 AM
    Subject: Re: browser encoding settings

    > Dean Harding wrote:
    >> Like most character set conversions, they probably convert it from
    >> whatever the source encoding is to some form of Unicode (usually whatever

    >> is most convenient for the platform), and then into whatever output
    >> encoding they wanted (in this case UTF-8).
    >
    > i'm not sure that's true for yahoo. we've had numerous headaches sending
    > utf-8 mail to their users. from what we were able to tease out of their
    > html it looks like the encoding is dependent on "where" the yahoo mail
    > server is. some "US" servers don't seem to have any html encoding hints at

    > all, "Chinese" servers seem to use GB2312, etc. users have had to manually

    > swap their browser's encoding, usually messing up the rest of the yahoo
    > content around the email. we couldn't find any official yahoo docs on this

    > (though maybe we didn't look hard enough or in the right places). we more
    > or less gave up on it and included an idiotic "if you can't read this
    > email...." tag.



    This archive was generated by hypermail 2.1.5 : Wed Jun 01 2005 - 02:31:38 CDT