Re: Unicode, Cure-all or Kill-all?

From: John H. Jenkins (tseng@sj-coop.net)
Date: Wed Aug 14 1996 - 13:08:52 EDT


Martin says:

>John Jenkins wrote:
>
>>The model followed by the IRG -- based ultimately on Japanese standards
>>practice -- makes a three-fold distinction.
>>
>>X-variants: Different semantics, different abstract shape
>>Y-variants: Same semantics, different abstract shape
>>Z-variants: Same semantics, same abstract shape but different actual
>>shape
>
>Given the examples and explanations in the Unicode standard,
>and the example of C"machine"/J"table", this should be different:
>
>X-axis: Different semantics (shape is irrelevant)
>Y-axis: Different abstract shape (semantics is irrelevant)
>Z-axis: Different actual shape (semantics is irrelevant)
>

Yep. You said it best.

>John's explanation looks as if everything can be represented
>as a tree, with X-variants (semantics) sorted out at the first
>level, Y-variants (abstract shape) sorted out at the second
>level, and Z-variants (actual shape) sorted out at the
>third level. This is (mostly) true for Y and Z axes (it would be
>extremely far-fetched to construct examples with different
>abstract shape, but the same actual shape, although with
>some exotic fancy fonts, that might be done). It is however
>not true for the X axis vs. the Y/Z axes. Examples of Tai2
>have different X axis values depending on whether they
>mean "Taiwan", "Typhoon", or "Sir", but they have the same
>Y value, and the same Z value if they come from the same font.
>
>For codepoint identification, only the Y axis is relevant.
>The X axis (semantics) is sometimes consulted to decide
>whether shape differences have to be classified as Y (abstract)
>or Z (actual), because in some cases, a small shape difference
>can make a completely different character. Example having
>identical shape (Y/Z), however, are given a single codepoint
>irrespective of X value.
>

Semantics do matter in some cases at least, regardless of shape. For
example, you have the "earth" radical and the "knight" radical, which
have the same abstract shape, different actual shapes, and different
semantics. They are not unified. (The formal rule is that historically
non-cognate characters with the same abstract shape aren't unified.)

The problem here -- and I'm sure you're aware of it as anybody -- is that
there are so dang many ideographs involved, sometimes the unification is
done on a case-by-case basis, and the formal rules are a desperate
attempt after the fact to explain why. :-)

>>"A" and "B" are examples of X-variants.
>As are "porportional to", "difference between", "similar to",
>"APL tilde", "cycle", "not", and "tilde operator, and all the
>different hyphens.
>

Yes.

>>A grotesque "a" and gothic "a" are examples of Y-variants.
>"\" and "-" for set difference would be another example.
>The shapes can in some cases really look totally different,
>without any clue that it is the same if you don't know
>what it means.
>

Yes.

John H. Jenkins
tseng@sj-coop.net
jenkins@apple.com
http://www.sj-coop.net/~tseng



This archive was generated by hypermail 2.1.2 : Tue Jul 10 2001 - 17:20:31 EDT