Re: Yerushala(y)im - or Biblical Hebrew

From: Kenneth Whistler (kenw@sybase.com)
Date: Mon Jul 28 2003 - 16:22:00 EDT

  • Next message: Joan_Wardell@sil.org: "Back to Hebrew, was OT:darn'd fools"

    Joan Wardell wrote:

    > Ken: Speaking for Sybase products, "fixing" the combining classes of the
    > existing vowels would have *no* positive impacts. It would have
    > a large number of negative impacts, the ultimate ramifications
    > of which I cannot even follow to their eventual conclusions. ...
    >
    > I hope you will excuse my ignorance, but I do not understand how correcting
    > the canonical classes is such a huge technical problem. If anyone has
    > already normalized their biblical Hebrew data, they have trashed it, and it
    > will have to be re-done anyway.

    That is besides the point for the implementations I'm talking
    about, actually.

    > Secondly, the Character Properties would
    > appear to be one huge matrix which would be called by any software needing
    > to know these.

    It isn't.

    > Why can't we just fix the database? :)

    Because changing the canonical ordering classes (in ways not
    allowed by the stability policies) breaks the normalization
    *algorithm* and the expected test results it is tested against.

    > I am completely ignorant of the mechanics of sorting algorithms and
    > whatever types of software are required to implement canonical classes.
    > However I can tell you it is no small thing to write in some kind of
    > intelligence in every future keyboard, conversion table, and search engine
    > for Hebrew just to identify how to undo "Yerushali-am". And having to trick
    > every browser is no small feat either. And that is only one exception of
    > the many that have been discussed.

    The CGJ proposal doesn't involving tricking out anybody's display,
    if done correctly. And I'm not talking about "undoing" normalized
    i-a sequences that have the incorrect order. As you noted above,
    any such data is trashed already and will have to be "re-done anyway".

    I don't see where conversion tables are at issue here. No character
    mapping for Unicode is involved or changed by this. Unless you
    are talking about conversion algorithms for batch conversion of
    existing Biblical Hebrew repositories into Unicode -- but those
    are specialized code to begin with, and it is much less impact to
    ask people to update the tables in those to insert a CGJ into
    the point sequences than it is to ask all implementers to deal
    with the consequences of broken normalization.

    And I don't think you have thought through the consequences, for
    Biblical Hebrew itself, of having inconsistent normalization
    implementations (pre-fix and post-fix) floating around. Those
    will impact precisely the data you are trying to fix here, in
    ways that will force precisely the kinds of fixes in applications
    and search engines that you are worried about avoiding.

    --Ken



    This archive was generated by hypermail 2.1.5 : Mon Jul 28 2003 - 17:07:00 EDT