William Overington wrote:
> In Java source code one may currently represent a 16 bit
> unicode character by using \uhhhh where each h is any
> hexadecimal character.
>
> How will Java, and maybe other languages, represent 21 bit unicode
> characters?
\uhhhh in Java source becomes a value of the 16-bit primitive datatype
"char".
A char corresponds to a Unicode value -- a UTF-16 code value, which could
either represent a Unicode character or one half of a surrogate pair. In the
latter case, it would take a sequence of two "char"s to make one Unicode
character. It is my understanding that Java's character encoding/decoding
mechanisms can handle this sort of thing already. However, this is not
obvious when looking at any Java platform documentation.
I do agree that it would be more convenient to be able to refer to Unicode
characters in Java source by their scalar value, so one would not need any
knowledge of UTF-16.
This archive was generated by hypermail 2.1.2 : Fri Jul 06 2001 - 00:17:16 EDT