From: Kenneth Whistler (kenw@sybase.com)
Date: Mon Jul 28 2003 - 16:22:00 EDT
Joan Wardell wrote:
> Ken: Speaking for Sybase products, "fixing" the combining classes of the
> existing vowels would have *no* positive impacts. It would have
> a large number of negative impacts, the ultimate ramifications
> of which I cannot even follow to their eventual conclusions. ...
>
> I hope you will excuse my ignorance, but I do not understand how correcting
> the canonical classes is such a huge technical problem. If anyone has
> already normalized their biblical Hebrew data, they have trashed it, and it
> will have to be re-done anyway.
That is besides the point for the implementations I'm talking
about, actually.
> Secondly, the Character Properties would
> appear to be one huge matrix which would be called by any software needing
> to know these.
It isn't.
> Why can't we just fix the database? :)
Because changing the canonical ordering classes (in ways not
allowed by the stability policies) breaks the normalization
*algorithm* and the expected test results it is tested against.
> I am completely ignorant of the mechanics of sorting algorithms and
> whatever types of software are required to implement canonical classes.
> However I can tell you it is no small thing to write in some kind of
> intelligence in every future keyboard, conversion table, and search engine
> for Hebrew just to identify how to undo "Yerushali-am". And having to trick
> every browser is no small feat either. And that is only one exception of
> the many that have been discussed.
The CGJ proposal doesn't involving tricking out anybody's display,
if done correctly. And I'm not talking about "undoing" normalized
i-a sequences that have the incorrect order. As you noted above,
any such data is trashed already and will have to be "re-done anyway".
I don't see where conversion tables are at issue here. No character
mapping for Unicode is involved or changed by this. Unless you
are talking about conversion algorithms for batch conversion of
existing Biblical Hebrew repositories into Unicode -- but those
are specialized code to begin with, and it is much less impact to
ask people to update the tables in those to insert a CGJ into
the point sequences than it is to ask all implementers to deal
with the consequences of broken normalization.
And I don't think you have thought through the consequences, for
Biblical Hebrew itself, of having inconsistent normalization
implementations (pre-fix and post-fix) floating around. Those
will impact precisely the data you are trying to fix here, in
ways that will force precisely the kinds of fixes in applications
and search engines that you are worried about avoiding.
--Ken
This archive was generated by hypermail 2.1.5 : Mon Jul 28 2003 - 17:07:00 EDT