RE: Java and Unicode

From: Marco.Cimarosti@icl.com
Date: Wed Nov 15 2000 - 11:00:15 EST


Eliotte Rusty Harold wrote:

> One thing I'm very curious about going forward: Right now character
> values greater than 65535 are purely theoretical. However this will
> change. It seems to me that handling these characters properly is
> going to require redefining the char data type from two bytes to
> four. This is a major incompatible change with existing Java.
> (...)

John O'Conner just wrote something about surrogates
(http://www.unicode.org/unicode/faq/utf_bom.html#16) and UTF-16
(http://www.unicode.org/unicode/faq/utf_bom.html#5) in Java, but your
message was probably already on its way:

> You can currently store UTF-16 in the String and StringBuffer
> classes. However,
> all operations are on char values or 16-bit code units. The
> upcoming release of
> the J2SE platform will include support for Unicode 3.0 (maybe 3.0.1)
> properties, case mapping, collation, and character break
> iteration. There is no
> explicit support for surrogate pairs in Unicode at this time,
> although you can
> certainly find out if a code unit is a surrogate unit.
>
> In the future, as characters beyond 0xFFFF become more
> important, you can
> expect that more robust, official support will ollow.
>
> -- John O'Conner

_ Marco



This archive was generated by hypermail 2.1.2 : Tue Jul 10 2001 - 17:21:15 EDT