RE: MS/Unix BOM FAQ again (small fix)

From: Yves Arrouye (yves@realnames.com)
Date: Wed Apr 10 2002 - 14:06:50 EDT


> The reason for ICU's "UTF-16" converter not trying to auto-detect the BOM
> is that this seems to be something that the _application_ has to decide,
> not the _converter_ that the application instantiates.
> This converter name is (currently) only a convenience alias for "use the
> UTF-16 byte serialization that is normally used on this machine".

I agree that the application may know better. It is just unfortunate that
the name is not "UTF-16PE" to remind people that it is about platform
endianness (sp?). Also, when used in a script using say uconv, the script
does not have access to ucnv_detectUnicodeSignature(), so you end up in a
situation where you get a file identified as being in "UTF-16" but when you
use the "UTF-16" converter it may not be readable. If instead you had
"UTF-16PE" as the convenience name for the platform endian UTF-16, and
"UTF-16" handle the BOM and default byte order expectation (conformance
clause C3 of TUS) then it'd be much easier on newcomers.

YA



This archive was generated by hypermail 2.1.2 : Wed Apr 10 2002 - 12:39:33 EDT