charset parameter in Google Groups (was Re: Indian Rupee Sign to be chosen today)

From: Mark Davis ☕ (mark@macchiato.com)
Date: Mon Jun 28 2010 - 13:38:12 CDT

  • Next message: Mark Crispin: "Re: charset parameter in Google Groups (was Re: Indian Rupee Sign to be chosen today)"

    I'll overlook the lack of civility, since I can understand that kind of
    frustration when something doesn't work.

    This is the first I've heard of this as a problem with Google Groups. I
    filed a bug against Groups for this issue; I'll see what they find out. I
    don't know what is going on there, since Gmail handled your message
    correctly; that is, your message was tagged as:

    Content-Type: TEXT/PLAIN; charset=ISO-8859-7

    Gmail picked the right charset to convert into Unicode, and Greek characters
    are correctly displayed in my Gmail window, which is UTF-8.

    BTW, does the same thing happen if you send your email in UTF-8?

    The problem with slavishly following the charset parameter is that it is
    often incorrect. However, the charset parameter is a signal into the
    character detection module, so the charset is correctly supplied from the
    message then the results of the detection will be weighted that direction.

    Mark

    — Il meglio è l’inimico del bene —

    2010/6/28 Andreas Prilop <prilop4321@trashmail.net>

    > Full-quoting upside-down, Mark Davis wrote:
    >
    > > What I'm guessing is that message was sent in Latin-15,
    > > which can't be reliably distinguished from Latin-1.
    >
    > How come that all browsers show a euro sign at
    > http://www.unicode.org/mail-arch/unicode-ml/y2010-m06/0372.html
    > ??
    >
    > I tell you the secret:
    > It is the "charset" parameter, which is now 18 (eighteen!) years old.
    > Only Mark Davis and inept Google programmers still don't get it
    > even in the year 2010.
    >
    >
    > But it not just the euro sign. Here is the Greek alphabet:
    >
    > Α Β Γ Δ Ε Ζ Η Θ Ι Κ Λ Μ Ν Ξ Ο Π Ρ Σ Τ Υ Φ Χ Ψ Ω
    > α β γ δ ε ζ η θ ι κ λ μ ν ξ ο π ρ σ τ υ φ χ ψ ω
    >
    > I post this message also to <news:uk.test> so that others
    > can view at http://groups.google.co.uk/group/uk.test/topics
    > what Google makes out of it.
    >
    >
    > Nearly all messages from
    >
    > http://groups.google.co.uk/group/pl.test/browse_thread/thread/cea5fce42379a48e
    > are correctly treated as ISO-8859-2.
    >
    > Only my message
    > http://groups.google.co.uk/group/pl.test/msg/813fd6f1cdfa7d57
    > is treated as ISO-8859-1. Why?
    >
    > It seems that the silly Google "algorithm" checks whether a poster
    > writes from Poland or from Germany.
    >
    >
    > Groups.google is now infamous in Germany for fucking up special characters.
    > Just one example is
    >
    > http://groups.google.co.uk/group/de.comm.provider.usenet/msg/eac9d334f8c32578?dmode=source&output=gplain
    > All umlauts are fucked up by Google.
    >
    > I do call this brain-dead.
    >
    > I call it brain-dead to ignore the charset parameter and
    > to make silly guesses instead.
    >
    >
    > --
    > Inept programmers are not fired by Google;
    > they must work at groups.google.com.
    >
    >
    >



    This archive was generated by hypermail 2.1.5 : Mon Jun 28 2010 - 13:41:51 CDT