Re: Biblical Hebrew

From: Kenneth Whistler (kenw@sybase.com)
Date: Thu Jun 26 2003 - 23:15:12 EDT

  • Next message: Jony Rosenne: "Yerushala(y)im - or Biblical Hebrew (was Major Defect in Combining Classes of Tibetan Vowels)"

    John Hudson wrote:

    > At 03:52 PM 6/26/2003, Rick McGowan wrote:
    >
    > >I'll weigh in to agree with Ken here. The solution of cloning a whole set
    > >of these things just to fix combining behavior is, to understate, not quite
    > >nice.
    >
    > No, but would be far from the not nicest thing in Unicode, and there's a
    > really good reason for it. I was originally intrigued by Ken's ZWJ idea --
    > or by a variant of it using some new re-ordering inhibiting character, to
    > avoid overloading ZWJ any further --, but the more I think about it, the
    > more not nice I think it is to force Biblical scholars to carry the can for
    > errors in the Unicode combining classes.

    One of the reasons I keep poking around for alternatives that might
    work in a different way is that cloning sets of characters this
    way has a way of just displacing the problem. You don't want to
    force Biblical scholars to "carry the can" for the errors in
    the current combining classes...

    But who then does end up carrying the can eventually, if we go
    the cloning route? Cloning 14 characters creates a *new*
    normalization problem, and forces non-Biblical-scholar users of
    pointed Hebrew text to carry *that* particular can.

    How does a user of pointed Hebrew text know whether they are
    dealing with the legacy points, which people will have gone
    on using, outside the context of the group of cognoscenti who
    switch their applications and fonts over to the corrected set
    of points? What happens if they edit text represented in one
    scheme with a tool meant for the other? What about searches
    on data with pointed Hebrew -- should it normalize the two
    sets of points or not? (And here I am talking about normalization
    by an ad hoc, custom folding, rather than generic Unicode
    normalization.) Who carries the can for writing the conversion
    routines from data in one scheme or the other? How about
    conversion from legacy character sets for bibliographic
    data -- does that need to be upgraded? How about database
    implementations -- do they need custom extensions to do this
    folding as part of their query optimizations? And if the
    problem with the existing set of points is that their
    use in a normalized context eliminates distinctions that
    should be maintained, how do I write any conversion routines
    in such a way as to not corrupt or otherwise contaminate data
    using the new scheme? Who do I blame if my Hebrew fonts works
    with one set of points but not the other, and I'm getting
    intermittently trashed display as a result? ... and so on...

    I think if you really sit down and think about this in the
    larger context of users of Unicode Hebrew generically, instead
    of merely the Biblical Hebrew community that you are trying
    to find a solution for, you may realize that displacing the
    pain to *other* users may not be the best solution, either.

    While the solution I am suggesting is not without its
    conversion problems, I think they are significantly more
    tractable than those posed by cloning code points. The
    folding issue is much more straightforward, since it would
    consist entirely of ignoring the CGJ and applying standard
    normalization (or not). The new scheme would essentially be transparent
    to systems that don't bother inserting CGJ between points,
    as long as their fonts could handle the combinations.
    Loss of distinctions in order for data which is exported
    from the new systems, and then reimported, would be much
    less of an issue, since normalization could not destroy
    the distinctions without further intervention.

    > I believe the aim in fixing this
    > problem in Unicode should be to provide Biblical scholars with a good text
    > processing experience, not with awkward kludges,

    Yes, but I believe that is the responsibility of the systems and
    applications designers, given the tools and constraints we have
    to hand.

    > even if that means making
    > the Unicode Hebrew block look weird with duplicated marks.

    I really believe there be dragons there, and the end result will
    be to make it *more* difficult for the systems and applications
    designers to provide a "good text processing experience" to
    all users of pointed Hebrew text.

    --Ken



    This archive was generated by hypermail 2.1.5 : Fri Jun 27 2003 - 00:04:27 EDT