From: Arcane Jill (arcanejill@ramonsky.com)
Date: Thu Jan 13 2005 - 03:25:33 CST
I was just looking through Chapter 3 of the online version of the Unicode
Standard (ch03.pdf), and I noticed a curious definition.
First off, there are a couple of extremely LOGICAL definitions. For example:
(1) An "Abstract Character Sequence" is a sequence of "Abstract Character"s.
(2) A "Code Unit Sequence" is a sequence of "Code Unit"s.
This is exactly what I'd expect. But then there's this really ILLOGICAL one:
(3) A "Coded Character Sequence" is a sequence of ... (wait for it) ... "Code
Points".
("Coded Character" is also known as "Encoded Character"; "Coded Character
Sequence" is also known as "Coded Character Representation").
Hmm. A Sequence of "Code Points" is not called a "Code Point Sequence" but
something else entirely? That was unexpected.
And what do we call a sequence of "Coded Character"s? Clearly we can't call it
a "Coded Character Sequence", since that term seems to have been unhelpfully
reserved for a sequence of "Code Point"s.
Is this important? I dunno, but a "Code Point" is defined as an INTEGER in the
range 0 to 0x10FFFF, wheras a "Coded Character" is defined as a bidirectional
mapping between a single "Abstract Character" and a single "Code Point". So a
"Coded Character" may be thought of as an ordered pair containing a "Code
Point" (an integer) and an "Abstract Character" (an atom of text), wheras a
"Code Point" is just an integer.
So, logically, the sequence ( 1, 1, 2, 3, 5, 8, 13, 21, 34, 55, 89 ) is a
"Coded Character Sequence", even though it's just the start of the Fibonacci
series? The sequence ( 0xD800, 0xFFFF } is a Coded Character Sequence even
though neither of its elements can be mapped to a coded character?
This curious definition of "Coded Character Sequence" seems a bit strange to
me. Does it seem strange to anyone else? Have I misread something?
Jill?
This archive was generated by hypermail 2.1.5 : Thu Jan 13 2005 - 03:27:55 CST