From: John Cowan (cowan@ccil.org)
Date: Sat Nov 13 2004 - 11:13:45 CST
Theodore H. Smith scripsit:
> I'm just curious about the \0 thing. What problems would having a \0 in
> UTF-8 present, that are not presented by having \0 in ASCII? I can't
> see any advantage there.
AFAICT it was a hack so that arbitrary Java strings could be encoded
as C strings; that is, with no 0x00 bytes in them, even when the
string contained a U+0000. This is the format used in Java class
files for string constants as well.
The important thing is to note that the readUTF and writeUTF methods are
*binary* I/O; they are the standard way of serializing strings,
just as the standard way of serializing ints is to write them out
as a 4-byte big-endian sequence.
They simply have nothing to do with character encoding at all.
-- He made the Legislature meet at one-horse John Cowan tank-towns out in the alfalfa belt, so that cowan@ccil.org hardly nobody could get there and most of http://www.reutershealth.com the leaders would stay home and let him go http://www.ccil.org/~cowan to work and do things as he pleased. --Mencken, Declaration of Independence
This archive was generated by hypermail 2.1.5 : Sat Nov 13 2004 - 11:20:29 CST