Jörg, by any chance would this do what you need?
http://www.kreativekorp.com/software/recode/#reinterpret
-- Rebecca Bettencourt
On Mon, Oct 28, 2013 at 9:48 AM, Buck Golemon <buck_at_yelp.com> wrote:
>
>
>
> On Mon, Oct 28, 2013 at 6:06 AM, "Jörg Knappen" <jknappen_at_web.de> wrote:
>>
>> Hi Steffen,
>>
>> data aren't that easy. There are non-latin1-characters encoded in the UTF8
>> part. I expect
>> among others typographic apostrophes, polish characters, some mediaevalist
>> characters like
>> ũ (u with tilde). Maybe, there is also some greek inside, but I am not
>> sure about that.
>>
>> --Jörg Knappen
>>
>> Gesendet: Montag, 28. Oktober 2013 um 12:34 Uhr
>> Von: "Steffen \"Daode\" Nurpmeso" <sdaoden_at_gmail.com>
>> An: "Jörg Knappen" <jknappen_at_web.de>
>> Cc: unicode_at_unicode.org
>> Betreff: Re: Do you know a tool to decode "UTF-8 twice"
>> "Jörg Knappen" <jknappen_at_web.de> wrote:
>> | Is there a ready made tool that decodes "UTF-8 twice" while keeping
>> | UTF-8 proper in place?
>>
>> Isn't a shell script with a truly validating iconv(1) enough?
>> This works for me if in utf8.1 there is 'ÄEIÖÜ' in UTF-8 and i run
>>
>> ?0[steffen_at_sherwood tmp]$ iconv -f latin1 -t utf8 < utf8.1 > utf8.2
>>
>> As in
>>
>> for i in utf8.1 utf8.2; do
>> if iconv -f utf8 -t latin1 < ${i} |
>> iconv -f utf8 -t utf8 >/dev/null 2>&1; then
>> echo ${i}: bummer, going home by one
>> iconv -f utf8 -t latin1 < ${i} > ${i}.new 2>&1
>> else
>> echo ${i}: valid UTF-8
>> fi
>> done
>>
>> i'll end up as
>>
>> ?0[steffen_at_sherwood tmp]$ sh utf8dec.sh
>> utf8.1: valid UTF-8
>> utf8.2: bummer, going home by one
>> ?0[steffen_at_sherwood tmp]$
>>
>> Ciao,
>>
>> | --Jörg Knappen
>>
>> --steffen
>
>
> Jörg: There's no ready-made tool, but it's easy to write in python.
> I'll provide you a well-tested function in a few minutes.
>
>
>
Received on Mon Oct 28 2013 - 11:59:27 CDT
This archive was generated by hypermail 2.2.0 : Mon Oct 28 2013 - 11:59:27 CDT