Re: surrogate terminology (was Re: Surrogate support in *ML?

From: John Cowan (jcowan@reutershealth.com)
Date: Tue Sep 12 2000 - 14:18:52 EDT


Peter_Constable@sil.org wrote:
 
> - BMP characters: characters in the BMP; note that d800-dfff are not
> characters; fffe and ffff are also not characters

Not in the Glossary, but "BMP" is.

> - "astral"/supplementary/extended-plane/?? characters: everything in planes
> 1 - 16 (excluding anything ending in fffe and ffff)

We do need a term for this.

> - codepoint: I'm inclined to use this as an alternate term for Unicode
> Scalar Value; note that by this def'n d800 - dfff, fffe, etc. are *not*
> codepoints

Same as the Glossary. Note that "code point" can also be applied to non-Unicode
standards: 0x20 is the codepoint for DIGIT ZERO in US-ASCII.

> - code values: integers within the space of some encoding form; d800 - dfff
> *are* code values, but not codepoints

According to the Glossary, code values are bit strings, not integers.

> - surrogate: I'm inclined to say that this should refer *only* to a UTF-16
> code value in the range d800 - dfff; equal to "surrogate code value"

Yes, this is the obvious abstraction from D25 and D26.

> - surrogate pair: a valid pair of UTF-16 surrogate code values used to
> encode an "astral" character; note that a surrogate pair is *different*
> from the character they encode: surrogates come from the sphere of code
> values, not the sphere of characters/codepoints

Matches D27 and the Glossary.

Summary: the Unicode Standard's terms are in good shape.

-- 
There is / one art                   || John Cowan <jcowan@reutershealth.com>
no more / no less                    || http://www.reutershealth.com
to do / all things                   || http://www.ccil.org/~cowan
with art- / lessness                 \\ -- Piet Hein



This archive was generated by hypermail 2.1.2 : Tue Jul 10 2001 - 17:21:13 EDT