RE: Byte Order Marks

From: Yves Arrouye (yves@realnames.com)
Date: Fri Apr 20 2001 - 04:12:50 EDT

Next message: Yves Arrouye: "RE: Byte Order Marks"
Previous message: Ollikainen, Jari: "XML encoding problem?"
Maybe in reply to: Tomas McGuinness: "Byte Order Marks"
Next in thread: Markus Scherer: "Re: Byte Order Marks"
Reply: Markus Scherer: "Re: Byte Order Marks"
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ] [ attachment ]
Mail actions: [ respond to this message ] [ mail a new topic ]

> > Then why is ICU mapping UTF-16 to UTF16_PlatformEndian and not
> > UTF16_BigEndian?
>
> ICU does not do Unicode-signature or other encoding detection
> as part of a converter. When you get text from some protocol,
> you need to instantiate a converter according to what you
> know about the encoding.

So I can't pass it some text with a BOM and say "utf-16" and let it run
through that. I guess that explains why I also didn't find converters that
write a BOM at the start of the conversion. Is that something that would
added to ICU in the future? It would be very nice to have a converter that
would pick the BOM (and write it back).

And yes, most of the time, when you stay on a given platform, it is very
convenient to use the platform's endianness. My question was rather "why
isn't UTF-16 the one that detects the BOM and defaults to an externalized
form, BE, and then people on a given platform would just use UTF-16PE (of
which UTF-16 is an alias in ICU)?". That would facilitate interchange of
information.

Next message: Yves Arrouye: "RE: Byte Order Marks"
Previous message: Ollikainen, Jari: "XML encoding problem?"
Maybe in reply to: Tomas McGuinness: "Byte Order Marks"
Next in thread: Markus Scherer: "Re: Byte Order Marks"
Reply: Markus Scherer: "Re: Byte Order Marks"
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ] [ attachment ]
Mail actions: [ respond to this message ] [ mail a new topic ]

This archive was generated by hypermail 2.1.2 : Fri Jul 06 2001 - 00:17:16 EDT