MURATA Makoto wrote:
> I prefer UTF-16, since XML documents in legacy encodings never parse
> as UTF-16 and those in UTF-16 never parse as legacy encodings.
This seems confusing, especially with Unicode 3.0 where so much
of the BMP is now in use. Invalid UTF-8 is easy to spot, but
I think it would be easy to accept any non-ISO-2022 legacy
encoding (SJIS, e.g.) as UTF-16 and produce nonsense.
As Tim knows very well, UTF-16 has a number of problems about byte
> ordering. On the other hand, UTF-8 it not free from such problems.
> UTF-8 from Microsoft appears to begin with the zero-width non-breaking
> space always ;-(
ISO 10646 actually blesses this, although Unicode does not.
--John Cowan http://www.reutershealth.com jcowan@reutershealth.com Schlingt dreifach einen Kreis vom dies! / Schliess eurer Aug vor heiliger Schau Den er genoss vom Honig-Tau / Und trank die Milch vom Paradies. -- Coleridge (tr. Politzer)
This archive was generated by hypermail 2.1.2 : Tue Jul 10 2001 - 17:20:56 EDT