From: Peter Kirk (peterkirk@qaya.org)
Date: Thu Jul 15 2004 - 04:13:07 CDT
On 15/07/2004 05:00, Asmus Freytag wrote:
> At 01:52 PM 7/14/2004, Doug Ewell wrote:
>
>> It's not German data (with umlauts) that will be affected by this
>> solution, but non-German data (with diaereses) in German bibliographic
>> systems. That makes it a much smaller problem.
>
>
> the use of diaeresis is perfectly valid for words in fields that have
> a language ID 'German'.
>
>> The DIN request and the USNB solution didn't address this, because the
>> problem to be solved was disambiguating {a, o, u}-with-tréma from
>> {a, o,
>> u}-with-umlaut. If there are combinations of (for example)
>> a-with-tréma-and-something-else AND ALSO
>> a-with-umlaut-and-something-else, then those two will need to be
>> disambiguated somehow. But I strongly doubt that the latter case exists
>> in German bibliographic data, though of course one never knows.
>
>
> First off, there have to be corresponding entries in the sorting
> tables used for such data, to make that distinction have the correct
> effect. Since the sorting tables would not support anything ohter than
> <BASE, CGJ, DIAERESIS> there's no reason to introduce other sequences
> into the data.
>
> Secondly, the dieresis is used to indicate that two vowels are
> pronounced separately. I haven't seen a case where the vowels would
> already be accented.
There are such cases (although in most but not all of them technically
the vowel is not "already" accented because the diaeresis is encoded
closer to the base letter than the accent). This is certainly the case
in Greek, where diaeresis (indicating separate pronunciation) and
accents commonly occur on the same vowel; there are precomposed forms in
the Greek and Coptic and Greek Extended blocks. There are also a number
of precomposed forms in Latin Extended-B and Latin Extended Additional
with both diaeresis and another accent. Presumably these are used for
some language or other (well, some for Pinyin, some for Livonian, others
unspecified). And so they may occur in German bibliographic data. And in
that database each of them must have been encoded either with umlaut or
with tréma (presumably because they are understood as marking either a
vowel quality modification or a separation), and there is at least the
possibility that some combinations may have been encoded differently in
different places in the database. (And foreign words may be used within
book titles marked as German.) Therefore Unicode does need to consider
the issue, both as a theoretical one (which is essentially equivalent in
terms of its effect on normalisation to the theoretical problem with
using variation selectors with combining characters) and potentially as
a practical one.
-- Peter Kirk peter@qaya.org (personal) peterkirk@qaya.org (work) http://www.qaya.org/
This archive was generated by hypermail 2.1.5 : Thu Jul 15 2004 - 04:14:38 CDT