From: Kenneth Whistler (kenw@sybase.com)
Date: Mon May 19 2003 - 21:01:00 EDT
Jim Allan summed up:
> Accordingly, it would be reasonable that _ae_ and _oe_
[I've substituted schematic names for the UTF-8, to avoid
possible Latin-1/UTF-8 mailer trashing. --Ken]
> be classed as the
> same kind of thing in Unicode, whatever that thing might be. It would be
> reasonable that the same collating rules be applied to both as to
> primary or secondary differences from _a_ and _o_ respectively.
I agree, but others have insisted that _oe_ default to
its ligature treatment in the table, i.e., weighting as
an <o,e> sequence.
The difficulty for _ae_, which many people who opine about this
issue tend to overlook, is that the Unicode Standard also
includes, from Nordic standards, a number of accented _ae_
characters as precomposed characters. These make the table
considerably more complicated if the default treatment for
_ae_ is to weight it as an <a,e> sequence, since you then
have to figure out what to do with the accented forms, for which
you have just drained the base character weighting.
In any case, inconsistent as it is for these two characters,
the allkeys.txt table was constructed as it is for a reason,
(or several reasons, actually),
and I'm disinclined to suggest that its handling of _ae_
and _oe_ should be restructured, since that ripples out to
cause further destabilization of tailorings based on the
current values in the table.
--Ken
This archive was generated by hypermail 2.1.5 : Mon May 19 2003 - 22:01:31 EDT