From: Kenneth Whistler (kenw@sybase.com)
Date: Wed May 07 2003 - 14:00:54 EDT
Theodore Smith asked:
> I am having trouble understanding Unicode's documentation. I tried
> looking through the glossary for an explanation of a term I see about a
> lot "Canonical equivalence", this lead me back to the original document
> that had been using it a lot, but which I still hadn't lead me to find
> out what it meant.
The place to look, now that the Conformance chapter for Unicode
4.0 has been posted, is:
http://www.unicode.org/versions/Unicode4.0.0/ch03.pdf
See Section 3.7 Decomposition, which defines 'decomposition',
'compatibility decomposition', 'canonical decomposition',
'compatibility equivalent', 'canonical equivalent', and so on.
That is as close to the horse's mouth as you are going to get.
>
> On spending more effort trying to understand the document's terse
> format, I found out that it is telling me I have to read some kind of
> table listing.
The decomposition mappings used in the definitions are in
Section 16.1 Character Names List (not online yet) -- but that
is simply the code charts and character names. You can find
comparable listings just by opening up Unicode 3.0 and looking
at the comparable Section 14.1 Character Names List. The
conventions for the mappings are explained in great detail
on pp. 333-334 there.
>
> So to work out what this word canonical means, I have to remember one
> table, and to remember the word compatibility I have to remember
> another table.
No, you have to understand the distinction between two different
types of decomposition mappings in *one* table.
>
> I haven't found those tables yet,
The printed form is in the code charts and character names list
in the standard. (I presume you *have* found those. ;-) )
The ultimate, definitive, and normative source of the decomposition
mappings is the data file, UnicodeData.txt, which is also online.
http://www.unicode.org/Public/UNIDATA/UnicodeData.txt
The decomposition mapping field in that data file is used,
programatically, to generate the decomposition mappings which
are printed in the code charts.
Read:
http://www.unicode.org/Public/UNIDATA/UCD.html
to get information about UnicodeData.txt and any other of the
data files in the Unicode Character Database.
> and I'm still wondering why I'm being
> expected to take so many steps just to understand a term that is used
> all over the place.
>
> I'm sure there is a much simpler explanation of canonical and
> compatability?
Nope. A simpler one would likely not be a correct one. See
Section 3.7 of Chapter 3 of Unicode 4.0 (cited above) to get it
correct.
Although perhaps John Cowan might be persuaded to come up with
the pocket edition explanation, comparable to his famous
list of Unicode conformance requirements:
http://www.unicode.org/faq/basic_q.html#15
:-)
--Ken
> Or else why didn't they just use the terms "table1
> decomposition" and "table2 decomposition"?
>
> I'm guessing those words have a meaning, somehow, but I can't say what
> interpretation of those worsd have been used. Anyone can explain for
> me, perhaps?
This archive was generated by hypermail 2.1.5 : Wed May 07 2003 - 14:59:51 EDT