From: Doug Ewell (doug@ewellic.org)
Date: Tue Jan 13 2009 - 22:59:48 CST
Asmus Freytag <asmusf at ix dot netcom dot com> wrote:
>> In http://www.unicode.org/mail-arch/unicode-ml/y2009-m01/0077.html I
>> asked for a pointer to a full definition of "compatibility character"
>> in the Unicode 5.x text that would cover "a character that is
>> *completely unrelated* to any other character in the standard but is
>> encoded due to 'interoperability needs.'"
>
> No need to look very far - just check chapter 2 under compatibility
> character. (You could have easily found out for yourself, since the
> text is online).
No sir. I have the physical book right here in front of me, and here is
what it says on page 23, under the heading "2.3 Compatibility
Characters." Gather around, everyone, and follow along.
"Conceptually, compatibility characters are those that would not have
been encoded except for compatibility and round-trip convertibility with
other standards. They are variants of characters that already have
encodings as normal (that is, non-compatibility) characters in the
Unicode Standard; as such, they are more properly referred to as
compatibility variants."
The remainder of this paragraph, and the next two, refer to Arabic glyph
forms, CJK compatibility ideographs, and other characters that bear
visual similarity to the characters of which they could be considered
"variants."
Then, under the heading "Compatibility Decomposable Characters":
"There is a second, narrow sense of the term 'compatibility character'
in the Unicode Standard, corresponding to the notion of a compatibility
decomposable introduced in Section 2.2, Unicode Design Principles. This
sense is strictly defined as any Unicode character whose compatibility
decomposition is not identical to its canonical decomposition."
This remainder of this paragraph, and the next two, make further
reference to characters that are typified by their decomposition
mappings. There is a passage that *almost* appears, at first glance, to
admit entire de novo sets of symbols:
"A large number of compatibility decomposable characters are really
distinct symbols used in specialized notations, whether phonetic or
mathematical. They are therefore not compatibility variants in the
strict sense."
... but then goes on to explain that they still must be some sort of
variant of existing characters:
"Rather, their compatibility mappings express their historical
derivation from styled forms of standard letters. In these and similar
cases, such as fixed-width space characters, the compatibility
decompositions define possible fallback representations."
And finally, on page 25, under the heading "Mapping Compatibility
Characters":
"Identifying one character as a compatibility variant of another
character usually implies that the first can be remapped to the second
without the loss of any textual information other than formatting or
layout. However, such remapping cannot always take place because many
of the compatibility characters are included in the standard precisely
to allow systems to maintain one-to-one mappings to other existing
character encoding standards and code pages. In such cases, a remapping
would lose information that is important to maintaining some distinction
in the original encoding. By definition, a compatibility decomposable
character decomposes into a compatibly equivalent character or character
sequence. Even in such cases, an implementation must proceed with due
caution--replacing one with the other may change not only formatting
information, but also other technical distinctions on which some other
process may depend."
This is followed by two paragraphs that go into more detail about the
relationship between a compatibility character and the "standard"
character, or sequence with which it is associated, mostly to say that
the stylistic differences could affect the meaning of the text or cause
security problems.
There is NO TEXT HERE that talks about "compatibility characters" that
have no relationship whatsoever to existing "standard" characters or
sequences, but are encoded solely due to "interoperability needs" with
another standard. Even the passage on page 25 about characters that are
included for 1-to-1 mapping to other standards -- which is probably what
I was supposed to notice -- speaks of maintaining "some distinction in
the original encoding." Do we suppose this means a distinction between
a front-facing baby chick and a side-facing baby chick?
If you are seeing words on these three pages that are different from the
ones I am seeing, please quote the words in your reply.
-- Doug Ewell * Thornton, Colorado, USA * RFC 4645 * UTN #14 http://www.ewellic.org http://www1.ietf.org/html.charters/ltru-charter.html http://www.alvestrand.no/mailman/listinfo/ietf-languages ˆ
This archive was generated by hypermail 2.1.5 : Tue Jan 13 2009 - 23:05:43 CST