From: Mark Davis (mark.davis@jtcsv.com)
Date: Mon Nov 15 2004 - 11:33:17 CST
Every few years it seems that this subject blossoms on the list.
Remember that this stuff was done a long time ago. A variant of UTF-8 was
devised by the Java people that would allow them to *losslessly* convert
between String and a representation that C could handle. And losslessly
means that since U+0000 is legal in String, it had to be representable
anywhere in the C string. This was done very early in the development of
Java, even before there was an internationalization group in Javasoft.
The only real problem with this was that they simply called this UTF-8 at
that time. They have since documented, in response to requests by the
Unicode Consortium, that this is a modified, variant UTF-8. It is worked in
too heavily into the structure of Java for them to do much beyond
documenting, and I really haven't heard of real cases where this has caused
a problem.
I doubt that any further discussion of this will be productive.
Mark
This archive was generated by hypermail 2.1.5 : Mon Nov 15 2004 - 11:39:14 CST