Re: Aw: Re: Re: Re: Re: Re: Re: Do you know a tool to decode "UTF-8 twice" from Buck Golemon on 2014-01-30 (Unicode Mail List Archive)

From: Buck Golemon <buck_at_yelp.com>
Date: Thu, 30 Jan 2014 10:15:47 -0800

While I understand your argument, my intent was to suggest that
"mysql-latin1" was *not* as good as some other name. Surely you're not
arguing that all names are equivalently good. Obviously "mnmmmnmn" is a
worse name than "mysql-latin1".

"Mysql" has less to do with the issue than "whatwg" or "web", since this
codec is necessary any time you want to reproduce browser decoding,
regardless of whether mysql is involved. I contend that mysql adopted this
implementation because it is so popularly used for web applications.

"latin1" is less directly accurate than "cp1252". While whatwg requires
that latin1 be an alias of cp1252, it does the same for ascii, and it
maintains that the canonical name is "windows-1252".

Ideally you'd want to update the name of your project, but if not, that's
your preference :)

However if I can get some consensus on a least-bad name ("web-cp1252" with
alias "web-windows-1252" seems to be in the lead), I plan to release such a
codec.

This issue also extends far beyond python. Any language that deals with the
web (ie all of them) and wants to be able to interpret (legacy) bytes
exactly as a browser would (admittedly a niche, but still important task)
needs such a codec. I believe unicode.org should eventually recognize such
a codec. Ideally it would reflect that this is the most-common
implementation of cp1252, but if I need to use a different name, that's
better than nothing at all.

On Jan 30, 2014 12:31 AM, Jörg Knappen <jknappen_at_web.de> wrote:

> When you are looking for a *new* name for that encoding, why don't you
> just adopt the pythonese precedent
> mysql-latin1 ? It is as good or as bad as any other name, but has some
> footing just now.
>
> --Jörg Knappen
>
> *Gesendet:* Mittwoch, 29. Januar 2014 um 21:12 Uhr
> *Von:* "Anne van Kesteren" <annevk_at_annevk.nl>
> *An:* "Buck Golemon" <buck_at_yelp.com>
> *Cc:* "Markus Scherer" <markus.icu_at_gmail.com>, "Jörg Knappen" <
> jknappen_at_web.de>, "Frédéric Grosshans" <frederic.grosshans_at_gmail.com>,
> unicode <unicode_at_unicode.org>, unicode_at_norbertlindenberg.com
> *Betreff:* Re: Re: Re: Re: Re: Re: Do you know a tool to decode "UTF-8
> twice"
> On Wed, Jan 29, 2014 at 11:57 AM, Buck Golemon <buck_at_yelp.com> wrote:
> > Anne: Given that the intent is to implement exactly the whatwg spec, and
> the
> > group is currently called "whatwg" (even though it may eventually become
> a
> > historical artifact), is "whatwg-1252" most appropriate?
>
> It's up to you I suppose, but "whatwg-1252" just seems like long term
> it will lose its meaning. For the web "windows-1252" will always have
> this meaning due to deployed content, so "web-windows-1252" if you
> need to disambiguate from a different implementation of windows-1252
> makes sense to me.
>
>
> --
> http://annevankesteren.nl/
>
> _______________________________________________
> Unicode mailing list
> Unicode_at_unicode.org
> http://unicode.org/mailman/listinfo/unicode
>
>

_______________________________________________
Unicode mailing list
Unicode_at_unicode.org
http://unicode.org/mailman/listinfo/unicode
Received on Thu Jan 30 2014 - 12:17:02 CST

This archive was generated by hypermail 2.2.0 : Thu Jan 30 2014 - 12:17:04 CST