* Kenneth Whistler
|
| Abstract character
|
| that which is encoded; an element of the repertoire (existing
| independent of the character encoding standard, and often
| identifiable in other character encoding standards, as well
| as the Unicode Standard); the implicit basis of transcodings.
|
| Note that while in some sense abstract characters exist a
| priori by virtue of the nature of the units of various writing
| systems, their exact nature is only pinned down at the point
| that an actual encoding is done. They are not always obvious,
| and many new abstract characters may arise as the result of
| particular textual processing needs that can be addressed by
| characters. (E.g. WORD JOINER, OBJECT REPLACEMENT CHARACTER,
| etc., etc.)
This helps a little, but not all that much. I think spelling out the
details of how the term relates to the other terms would help.
The rest of the definitions wre quite clear.
* Lars Marius Garshol
|
| - are all assigned Unicode characters also abstract characters?
* Kenneth Whistler
|
| Yes. Or rather: all encoded characters are assigned to abstract
| characters.
Hmmmm. OK. So combining diacritics are also abstract characters? (I
was also unclear on ZWNJ and similar things, but you explicitly
mention that above, so...)
| (Note above -- abstract characters are also a concept which applies
| to other character encodings besides the Unicode Standard, and not
| all encoded characters in other character encodings automatically
| make it into the Unicode Standard, for various architectural
| reasons.)
Right. So VIQR, for example, also has abstract characters, then?
* Lars Marius Garshol
|
| - do <U+00C5> (Å) and <U+0041, U+030A> (A followed by combining ring
| above) represent the same abstract character?
* Kenneth Whistler
|
| Yes. That is the implicit claim behind a specification of canonical
| equivalence.
Right. Then I think I've more-or-less got it.
This helped a lot. Thank you!
However, it does raise a new problem. Isn't the definition of 'string'
in the XPath specification then wrong?
Strings consist of a sequence of zero or more characters, where a
character is defined as in the XML Recommendation [XML]. A single
character in XPath thus corresponds to a single Unicode abstract
character with a single corresponding Unicode scalar value (see
[Unicode]); [...]
<URL: http://www.w3.org/TR/xpath#strings >
As far as I can tell, one of these two claims must be wrong. That is,
either a single XPath character does not necessarily correspond to a
single Unicode abstract character, or else a single XPath character
need not correspond to a single scalar value.
Does that sound reasonable?
-- Lars Marius Garshol, Ontopian <URL: http://www.ontopia.net > ISO SC34/WG3, OASIS GeoLang TC <URL: http://www.garshol.priv.no >
This archive was generated by hypermail 2.1.2 : Tue Jul 23 2002 - 04:08:41 EDT