Re: Problems encoding the spanish o

From: Pim Blokland (pblokland@planet.nl)
Date: Mon Nov 17 2003 - 07:26:19 EST

Next message: Marco Cimarosti: "RE: Problems encoding the spanish o"

Previous message: Marco Cimarosti: "RE: Problems encoding the spanish o"
In reply to: pepe pepe: "Problems encoding the spanish o"
Next in thread: Marco Cimarosti: "RE: Problems encoding the spanish o"
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ] [ attachment ]
Mail actions: [ respond to this message ] [ mail a new topic ]

pepe pepe schreef:

> We have the following sequence of characters "...ización Map.."
that is
> the same than "...ización Map..." that after suffering some
> transformations becomes to "...izaci&#56186;&56333;ap...."
> AS you can see the two characters 56186 and 56333 seem to
represent this
> sequences "ón M". Any idea?.

Yes, your input text obviously gets flagged as being in UTF-8
format, even if it is Latin-1 (or any codepage that has a ó at index
243).
Not only that, but the process making the mistake of thinking it is
UTF-8 also makes the mistake of not generating an error for
encountering malformed byte sequences, AND of outputting the result
as two 16-bit numbers instead of one 21-bit number.

If you take the byte sequence (hex) F3 6E 20 4D and treat it as
UTF-8 and don't care it's not valid, this maps to the value
(hex)1EE80D. Again, not caring this is not a valid codepoint,
turning this into UTF-16 would yield U+DB7A U+DC0D, which is what
you got in your output.

Pim Blokland

Next message: Marco Cimarosti: "RE: Problems encoding the spanish o"
Previous message: Marco Cimarosti: "RE: Problems encoding the spanish o"
In reply to: pepe pepe: "Problems encoding the spanish o"
Next in thread: Marco Cimarosti: "RE: Problems encoding the spanish o"
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ] [ attachment ]
Mail actions: [ respond to this message ] [ mail a new topic ]

This archive was generated by hypermail 2.1.5 : Mon Nov 17 2003 - 08:16:50 EST