From: Jill Ramonsky (Jill.Ramonsky@aculab.com)
Date: Wed Oct 15 2003 - 08:23:00 CST
 > -----Original Message-----
 > From: Philippe Verdy [mailto:verdy_p@wanadoo.fr]
 >
 > The same
 > optimization can be done in Java by subclassing the String
 > class to add a "form" field and related form conversion (getters)
 > and tests methods.
Only slightly confused about this. The Java String class is declared 
*final* in the API, and therefore cannot be subclassed. One would have 
to write an alternative String class (not rocket science of course, but 
still a tad more involved than subclassing).
 > In fact, to further optimize and reduce the
 > memory footprint of Java strings, in fact I choosed to store
 > the String in a array of bytes
Okay. That explains that then.
 > It is possible, with a custom class loader to overide the default
 > String class used in the Java core libraries
Ouch. Never taken Java that far myself. I like the idea though. Is it 
difficult?
 > Looking at the Java VM machine specification, there does not
 > seem to be something implying that a Java "char" is necessarily a
 > 16-bit entity. So I think that there will be sometime a conforming
 > Java VM that will return UTF-32 codepoints in a single char, or
 > some derived representation using 24-bit storage units.
I've wondered about that ever since Unicode went to 21 bits. Actually of 
course, it's C (and C++), not Java,  which has the real problem. C is 
(supposed to be) portable, but fast on all architectures, so all of the 
built-in types have platform-dependent widths. (So far so good). The 
annoying thing is that, BY DEFINITION, the *sizeof()* operator returns 
the size of an object /measured in chars/. Therefore, it is a violation 
of the rules of C to have an addressable object smaller than a char. One 
/can/ have 32-bit chars, but /only/ if you disallow bytes and 16-bit 
words. *sizeof()* is not allowed to return a fraction. Sigh! If only C 
had seen fit to measure addressable locations in /bits/, or even 
architecture-specific-/atoms/ (which would have been 8-bits wide on most 
systems), then we could have had sizeof(char) returning 4 or something. 
Ah well.
 
 > This leads to many discussions about what is a "character"
I think we just had that discussion. If it happens again I'm probably 
not going to join in (though it was quite amusing).
Jill
This archive was generated by hypermail 2.1.5 : Thu Jan 18 2007 - 15:54:24 CST