From: Peter Kirk (peterkirk@qaya.org)
Date: Tue Jul 13 2004 - 17:15:42 CDT
On 13/07/2004 20:02, Asmus Freytag wrote:
> At 11:02 AM 7/13/2004, Peter Kirk wrote:
>
>> I was surprised to see that WG2 has accepted a proposal made by the
>> US National Body to use CGJ to distinguish between Umlaut and Tréma
>> in German bibliographic data.
>
>
> You raise some interesting questions. However, note that the purpose
> of CGJ is intended for sorting related distinctions, which are at
> issue here. This is different from variation selectors which are
> intended to be used for displayed variations.
OK. But this is not a unique case. For example, in Hebrew Silluq and
Meteg, Dagesh and Shuruq are pairs of different marks which share a
glyph and so a Unicode character but may need to be distinguished for
certain processes. Should similar encodings with CGJ be proposed to make
these distinctions? For that matter, what if in a certain (hypothetical)
language consonant Y and vowel Y should be collated differently? Would
that justify an endoing of one of them with CGJ? But then these are not
combining characters in the first place. So I must agree with Doug that
"CGJ + COMBINING DIAERESIS is a hack".
On 13/07/2004 19:35, Doug Ewell wrote:
> ...
>
>The alternative proposed by DIN, creating a new COMBINING UMLAUT
>character, would have caused *unprecedented and catastrophic*
>equivalence and normalization problems.
>
>
>
Understood. But I can argue in the same way that creating a new RIGHT
HOLAM character for Holam Male would cause *catastrophic* equivalence
and normalisation problems, although no longer unprecedented because we
have the umlaut/tréma precedent. The situation is really very similar:
two combining marks which are not distinguished in most modern
typography, but which are distinguished graphically in some typefaces
(if I remember correctly, in Fraktur as well as in the typefaces
mentioned in Victor Gaultney's paper); and which have distinct
interpretations and are distinguished in some existing data in which the
distinction is important; but which should not be split into separate
characters because this would seriously destabilise the majority of
existing data in the script which does not make the distinction.
What many people are telling me to do with Holam Male (e.g. Less
Preferred Option 4 in http://www.qaya.org/academic/hebrew/Holam2.html)
is equivalent to the following solution to the umlaut/tréma problem:
define a new tréma character, or perhaps new umlaut and tréma
characters, to be used only in the German bibliographic data, and ignore
the problem that this makes the bibliographic data incompatible with all
other German text, and unable to be displayed by existing fonts until
they get round to adding the new characters - as well as ignoring the
problem that the precomposed characters have the wrong decomposition.
(The Hebrew equivalent to this is that U+FB4B should decompose to Holam
Male not Vav Haluma.) If that solution was not acceptable for German,
why should it be acceptable for Hebrew?
>>It seems to me that the UTC should bite the bullet and accept that
>>there is a need for variation sequences for combining marks, and
>>either adjust the definitions of existing variation selectors or
>>encode new specialised variation selectors for them. The adjusted or
>>new variation selectors can then be used for Hebrew as well as for
>>German - see my posting on this subject to the Hebrew list.
>>
>>
>
>"When 256 variation selectors just won't do, invent another."
>(with apologies to Ken Whistler)
>
>
256 variation selectors won't do if they have all been defined
unchangeably with the wrong properties e.g combining class. On the other
hand, if the UTC is prepared to ignore the combining class and
normalisation problems involved in using one combining class zero
character, CGJ, to modify a combining mark, it may as well ignore the
identical problems involved in using variation selectors, also combining
class zero, with combining marks.
-- Peter Kirk peter@qaya.org (personal) peterkirk@qaya.org (work) http://www.qaya.org/
This archive was generated by hypermail 2.1.5 : Tue Jul 13 2004 - 17:17:00 CDT