Re: terminology

From: David Hopwood (david.hopwood@zetnet.co.uk)
Date: Wed May 08 2002 - 08:48:21 EDT


-----BEGIN PGP SIGNED MESSAGE-----

Kenneth Whistler wrote:
> So the bone of contention is whether Unicode Scalar Value should be
> defined as equivalent to "code point", as in the current glossary,
> or should be defined as equivalent to "nonsurrogate code point", which
> is more consistent with the character encoding model and the definition
> of the UTF-16 encoding form. The latter, by the way, is the consensus
> which was just reached by the UTC meeting last week.

IMHO it's the definition of "Unicode code point" that is problematic,
not "Unicode scalar value". They should be synonyms and should have the
domain 0..0xD7FF union 0xE000..0x10FFFF. (This is consistent with the
definition of "code point" for other CCSs, where there is no requirement
for the domain to be a contiguous range of integers. The fact that
there are properties in the UCD for 0xD800..0xDFFF is just a historical
artifact.)

It is UTF-16 (CEF) code units that have the domain 0..0x10FFFF.

- --
David Hopwood <david.hopwood@zetnet.co.uk>

Home page & PGP public key: http://www.users.zetnet.co.uk/hopwood/
RSA 2048-bit; fingerprint 71 8E A6 23 0E D3 4C E5 0F 69 8C D4 FA 66 15 01
Nothing in this message is intended to be legally binding. If I revoke a
public key but refuse to specify why, it is because the private key has been
seized under the Regulation of Investigatory Powers Act; see www.fipr.org/rip

-----BEGIN PGP SIGNATURE-----
Version: 2.6.3i
Charset: noconv

iQEVAwUBPNkeETkCAxeYt5gVAQFx1wf8CO4Lc/E1ac1QCcMetVVkeeUagQ1EDJDA
bJfIT5xCV02zLRXTYox9IW1x8umOxNKv7RPWj48uA4RKOQKlSsuDHwG5O8z0qu78
ODi3rbB3pk+xTx9lqYdj7uiONwDhaVyzc7rPcmDueH6GVff/zlGW7YhbfZVRR7iy
oRPo9FhOkNN1u9C3UPbyYj2r8Mt1+INDxdYzXPQd0VwL5+9n8gM+T5YXUiO7BhAH
JM2nh0x1O4AJcxABxg31DLMowr6K9/dT7G2hH+SK7PJxzToCj6P3NDpYwUjEid7R
kFRJhA5xyGzG5pQJWloGo+aXxFcuPsBQw2WA66EEDi+T/5qSJLekdg==
=Gyp+
-----END PGP SIGNATURE-----



This archive was generated by hypermail 2.1.2 : Wed May 08 2002 - 11:07:58 EDT