RE: Byte Order Marks

From: Yves Arrouye (yves@realnames.com)
Date: Fri Apr 20 2001 - 04:12:50 EDT


> > Then why is ICU mapping UTF-16 to UTF16_PlatformEndian and not
> > UTF16_BigEndian?
>
> ICU does not do Unicode-signature or other encoding detection
> as part of a converter. When you get text from some protocol,
> you need to instantiate a converter according to what you
> know about the encoding.

So I can't pass it some text with a BOM and say "utf-16" and let it run
through that. I guess that explains why I also didn't find converters that
write a BOM at the start of the conversion. Is that something that would
added to ICU in the future? It would be very nice to have a converter that
would pick the BOM (and write it back).

And yes, most of the time, when you stay on a given platform, it is very
convenient to use the platform's endianness. My question was rather "why
isn't UTF-16 the one that detects the BOM and defaults to an externalized
form, BE, and then people on a given platform would just use UTF-16PE (of
which UTF-16 is an alias in ICU)?". That would facilitate interchange of
information.

YA



This archive was generated by hypermail 2.1.2 : Fri Jul 06 2001 - 00:17:16 EDT