From: Doug Ewell (dewell@adelphia.net)
Date: Thu Jul 15 2004 - 10:47:58 CDT
Peter Kirk <peterkirk at qaya dot org> wrote:
>> Nobody doubts that some text exists with multiple accents on vowels.
>> Where the vowels are not Latin a,o,u, there is no issue at all, in
>> this case, since there are no differences in German sorting for them.
>
> Well, yes, but http://std.dkuug.dk/jtc1/sc2/wg2/docs/n2819.pdf, does
> not make it clear that the <CGJ, DIAERESIS> sequence is to be used
> only with Latin a, o and u; rather it states "<CGJ, [DIAERESIS]> →
> tréma". Perhaps the proposal needs modification to make this point
> clear, if that is the intention.
The wording you are looking for is in the first paragraph under the
heading "Alternative solution":
"The solution consists, essentially, of using U+034F COMBINING GRAPHEME
JOINER (CGJ), in its intended semantics in 10646/Unicode, to make the
relevant sorting, searching, and data mapping distinctions required for
umlaut versus tréma."
Note carefully the words "relevant" and "required." The solution
proposed in N2819 is:
* RELEVANT only to the characters ( ä ö ü Ä Ö Ü ) which can occur in
German bibliographic data AND in which the diacritic may represent
either umlaut or tréma, and
* REQUIRED only in contexts where this distinction must be made in plain
text.
N2819 does not propose <CGJ, U+0308> as a general-purpose representation
for combining tréma. It is not being proposed for the text of the
Unicode Standard, or as a UAX or even a public-review issue. The
solution is intended for the German bibliographers, and presumably for
anyone else who needs ("required") to make the same ("relevant")
distinction.
> Second, N2819 does not make it clear that the <CGJ, DIAERESIS>
> sequence is to be used only for Latin script data. I would expect
> (someone can check this of course, and without checking this is indeed
> speculation) that there is Greek text in German bibliographic
> databases in which the Greek diaeresis is represented in ISO 5426 as
> tréma rather than umlaut; that would be correct because the function
> of Greek diaeresis is separation rather than vowel modification.
Unless there is Greek text where U+0308 can represent either a tréma OR
an umlaut, and unless there is a need to make the distinction in plain
text -- both of which we know not to be true -- this solution is neither
relevant nor required.
> And I would expect an implementer reading N2819 to conclude that all
> ISO 5426 trémas should be converted to <CGJ, DIAERESIS> as no mention
> is made of a restriction to Latin script or to just a, o and u.
I would expect an implementer to read the whole document and understand
the context in which it is intended.
> So there is a real chance of a conversion program producing sequences
> which could confuse normalisation, e.g. <IOTA, CGJ, DIAERESIS, ACUTE>,
> although hopefully not <IOTA, ACUTE, CGJ, DIAERESIS> which might be a
> real problem.
Nobody should be rushing to build conversion programs to convert U+0308
sequences as described in N2819, unless their client is the German
library network. Even I won't be doing it, and you know how I am about
conversion programs.
> My concern as always is with the apparent inconsistency of bending the
> normal rules or ignoring the normalisation concerns for German while
> refusing to do more or less the same for Hebrew. I appreciate that
> Germany is a larger and richer country than Israel and so, at least
> for commercial interests, its concerns deserve some priority. But that
> should not be a reason to reject as invalid or insignificant issues
> concerning Hebrew. And the issue of avoiding incompatible
> representation of the same data is a real one for Hebrew Holam Male
> vs. Vav Haluma just as it is for German umlaut vs. tréma.
Ken Whistler already tried to explain, I think twice, that this use of
CGJ to affect collation has nothing to do with your proposal to use
variation selectors to affect rendering of combining marks.
And I already tried to explain, at least twice, that the N2819 solution
does *not* affect normalization. This is explained very clearly in the
document. You are not reading.
You will get nowhere at all, and lose any remaining credibility, by
claiming that these decisions are being made based on political or
economic favoritism rather than technical differences.
> I am not actually asking for variation selectors with combining marks
> because I realise that the UTC has already made a decision and is
> unlikely to reverse it. But I am asking for some flexibility on some
> of the principles, of the kind which has been demonstrated with umlaut
> and tréma, and also in the Indic scripts proposal under review, in
> order to find an acceptable solution to a real problem.
OK, readers, whom does Peter sound like?
-Doug Ewell
Fullerton, California
http://users.adelphia.net/~dewell/
This archive was generated by hypermail 2.1.5 : Thu Jul 15 2004 - 10:49:54 CDT