Re: Aw: Re: Re: Do you know a tool to decode "UTF-8 twice"

From: Frédéric Grosshans <frederic.grosshans_at_gmail.com>
Date: Wed, 30 Oct 2013 15:34:08 +0100

Le 29/10/2013 17:15, "Jörg Knappen" a écrit :
> After running this script, a few more things were there:
> Non-normalised accents and some really strange
> encodings I could not really explain but rather guess their meanings, like
> s/Ãœ/Ü/g
> s/É/É/g
> s/AÌ€/À/g
> s/aÌ€/à/g
> s/EÌ€/È/g
> s/eÌ€/è/g
> s/„/„/g
> s/“/“/g
> s/ß/ß/g
> s/’/’/g
> s/Ä/Æ/g

It was probably not utf8 read as latin 1 and reencoded in utf8, but
utf_8 encoding read as Windows 1252 (
http://en.wikipedia.org/wiki/Windows-1252 ) and reencoded as utf-8. Each
of the combination above contains a character absent in latin-1
(œ‰€žŸ™„), and some of them are only present in Windows-1252 (‰™„) and
not in Latin-15, the other possible mistake.

I'v e check that this is consistent with Ü É and ß but not with your Æ.
This double encoding would give Ä :
Ä=Win1252(C3 84)=110.00011 10.000100 = UTF8(00011 000100)=unicode 00C4
=Ä (and not Æ)

        Frédéric
Received on Wed Oct 30 2013 - 09:37:45 CDT

This archive was generated by hypermail 2.2.0 : Wed Oct 30 2013 - 09:37:49 CDT