Re: Biblical Hebrew (Was: Major Defect in Combining Classes of Tibetan Vowels)

From: John Hudson (tiro@tiro.com)
Date: Thu Jun 26 2003 - 20:15:25 EDT

  • Next message: John Hudson: "Re: Biblical Hebrew"

    At 03:36 PM 6/26/2003, Kenneth Whistler wrote:

    >Why is making use of the existing behavior of existing characters
    >a "groanable kludge", if it has the desired effect and makes
    >the required distinctions in text? If there is not some
    >rendering system or font lookup showstopper here, I'm inclined
    >to think it's a rather elegant way out of the problem.

    I think assumptions about not breaking combining mark sequences may, in
    fact, be a showstopper. If <base+mark+mark> becomes
    <base+mark+CtrlChar+mark>, it is reasonable to think that this will not
    only inhibit mark re-ordering but also mark combining and mark
    interraction. Unfortunately, this seems to be the case with every control
    character I have been able to test, using two different rendering engines
    (Uniscribe and InDesign ME -- although the latter already has some problems
    with double marks in Biblical Hebrew). Perhaps we should have a specific
    COMBINING MARK SEQUENCE CONTROL character?

    All that said, I disagree with Ken that this is anything like an elegant
    way out of the problem. Forcing awkward, textually illogical and easily
    forgetable control character usage onto *users* in order to solve a problem
    in the Unicode Standard is not elegant, and it is unlikely to do much for
    the reputation of the standard.

    Q: 'Why do I have to insert this control character between these points?'
    A: 'To prevent them from being re-ordered.'
    Q: 'But why would they be re-ordered anyway? Why wouldn't they just stay in
    the order I put them in?'
    A: 'Because Unicode normalisation will automatically re-order the points.'
    Q: 'But why? Points shouldn't be re-ordered: it breaks the text.'
    A: 'Yes, but the people who decided how normalisation should work for
    Hebrew didn't know that.'
    Q: 'Well can't they fix it?'
    A: 'They have: they've told you that you have to insert this control
    character...'
    Q: 'But *I* didn't make the mistake. Why should I have to be the one to
    mess around with this annoying control character?'

    ... and so on.

    Much as the duplication of Hebrew mark encoding may be distasteful, and
    even considering the work that will need to be done to update layout
    engines, fonts and documents to work with the new mark characters, I agree
    with Peter Constable that this is by far the best long term solution,
    especially from a *user* perspective. Over the past two months I have been
    over this problem in great detail with the Society of Biblical Literature
    and their partners in the SBL Font Foundation. They understand the problems
    with the current normalisation, and they understand that any solution is
    going to require document and font revisions; they're resigned to this, and
    they've worked hard to come up with combining class assignments that would
    actually work for all consonant + mark(s) sequences encountered in Biblical
    Hebrew. This work forms the basis of the proposal submitted by Peter
    Constable. Encoding of new Biblical Hebrew mark characters provides a
    relatively simple update path for both documents and fonts, since it
    largely involves one-to-one mappings from old characters to new.

    Conversely, insisting on using control characters to manage mark ordering
    in texts will require analysis to identify those sequences that will be
    subject to re-ordering during normalisation, and individual insertion of
    control characters. The fact that these control characters are invisible
    and not obvious to users transcribing text, puts an additional burden on
    application and font support, and adds another level of complexity to using
    what are already some of the most complicated fonts in existence (how many
    fonts do you know that come with 18 page user manuals?). I think it is
    unreasonable to expect Biblical scholars to understand Unicode canonical
    ordering to such a deep level that they are able to know where to insert
    control characters to prevent a re-ordering that shouldn't be happening in
    the first place.

    John Hudson

    Tiro Typeworks www.tiro.com
    Vancouver, BC tiro@tiro.com

    If you browse in the shelves that, in American bookstores,
    are labeled New Age, you can find there even Saint Augustine,
    who, as far as I know, was not a fascist. But combining Saint
    Augustine and Stonehenge -- that is a symptom of Ur-Fascism.
                                                                 - Umberto Eco



    This archive was generated by hypermail 2.1.5 : Thu Jun 26 2003 - 21:00:56 EDT