From: John Cowan (jcowan@reutershealth.com)
Date: Wed Oct 15 2003 - 15:18:05 CST
Philippe Verdy scripsit:
> [...] char, whose values are 16-bit unsigned integers
> representing Unicode characters (section 2.1).
Despite your ingenious special pleading, I don't see how this can mean
anything except that chars must be 16-bit unsigned integers.
> The Java language still lacks a way to specify a literal for a character out
> of the BMP. Of course one can use the syntax '\uD800\uDC00' but this would
> not compile with the current _compilers_, that expect only one char in the
> literal. In a String literal "\uD800\uDC00" becomes the 4-bytes UTF-8
> sequence for _one_ Unicode codepoint in the compiled class.
Character literals are crocky anyhow. IMHO modern programming languages
should not have a Character type, but deal only in Strings.
> 2. The initial spec of UTF-32 and UTF-8 by ISO allowed much more planes with
> 31-bit codepoints, and may be there will be an agreement sometime in the
> future between ISO and Unicode to define new codepoints out of the current
> standard 17 first planes that can be safely converted with UTF-16,
I doubt it very much. 17 planes is waaaay more than sufficient.
-- John Cowan jcowan@reutershealth.com www.reutershealth.com www.ccil.org/~cowan Assent may be registered by a signature, a handshake, or a click of a computer mouse transmitted across the invisible ether of the Internet. Formality is not a requisite; any sign, symbol or action, or even willful inaction, as long as it is unequivocally referable to the promise, may create a contract. --_Specht v. Netscape_
This archive was generated by hypermail 2.1.5 : Thu Jan 18 2007 - 15:54:24 CST