Peter_Constable@sil.org wrote:
> - BMP characters: characters in the BMP; note that d800-dfff are not
> characters; fffe and ffff are also not characters
Not in the Glossary, but "BMP" is.
> - "astral"/supplementary/extended-plane/?? characters: everything in planes
> 1 - 16 (excluding anything ending in fffe and ffff)
We do need a term for this.
> - codepoint: I'm inclined to use this as an alternate term for Unicode
> Scalar Value; note that by this def'n d800 - dfff, fffe, etc. are *not*
> codepoints
Same as the Glossary. Note that "code point" can also be applied to non-Unicode
standards: 0x20 is the codepoint for DIGIT ZERO in US-ASCII.
> - code values: integers within the space of some encoding form; d800 - dfff
> *are* code values, but not codepoints
According to the Glossary, code values are bit strings, not integers.
> - surrogate: I'm inclined to say that this should refer *only* to a UTF-16
> code value in the range d800 - dfff; equal to "surrogate code value"
Yes, this is the obvious abstraction from D25 and D26.
> - surrogate pair: a valid pair of UTF-16 surrogate code values used to
> encode an "astral" character; note that a surrogate pair is *different*
> from the character they encode: surrogates come from the sphere of code
> values, not the sphere of characters/codepoints
Matches D27 and the Glossary.
Summary: the Unicode Standard's terms are in good shape.
-- There is / one art || John Cowan <jcowan@reutershealth.com> no more / no less || http://www.reutershealth.com to do / all things || http://www.ccil.org/~cowan with art- / lessness \\ -- Piet Hein
This archive was generated by hypermail 2.1.2 : Tue Jul 10 2001 - 17:21:13 EDT