Re: Biblical Hebrew (Was: Major Defect in Combining Classes of Tibetan Vowels)

From: Kenneth Whistler (kenw@sybase.com)
Date: Thu Jun 26 2003 - 16:41:02 EDT

  • Next message: Michael Everson: "Re: Revised N2586R"

    Peter replied to Karljürgen:

    > Karljürgen Feuerherm wrote on 06/25/2003 08:31:41 PM:
    >
    > > I was going to suggest something very similar, a ZW-pseudo-consonant of
    > some
    > > kind, which would force each vowel to be associated with one consonant.
    >
    > An invisible *consonant* doesn't make sense because the problem involves
    > more than just multiple written vowels on one consonant;

    I agree that we don't want to go inventing invisible consonants for
    this.

    BTW, there's already an invisible vowel (in fact a pair of them)
    that is unwanted by the stakeholders of the script it was
    originally invented for:

    U+17B4 KHMER VOWEL INHERENT AQ

    This is also (cc=0), so would serve to block canonical reordering
    if placed between two Hebrew vowel points. But I'm sure that if
    Peter thought the suggestion of the ZWJ for this was a "groanable
    kludge", Biblical Hebraicists would probably not take lightly
    to the importation of an invisible Khmer character into their
    text representations. ;-)

    > in fact, that is
    > a small portion of the general problem. If we want such a character, it
    > would notionally be a zero-width-canonical-ordering-inhibiter, and nothing
    > more.

    The fact is that any of the zero-width format controls has the
    side-effect of inhibiting (or rather interrupting) canonical reordering
    if inserted in the middle of a target sequence, because of their
    own class (cc=0).

    I'm not particularly campaigning for ZWJ, by the way. ZWNJ or even
    U+FEFF ZWNBSP would accomplish the same. I just suggested ZWJ because
    it seemed in the ballpark. ZWNBSP would likely have fewer possible
    other consequences, since notionally it means just "don't break here",
    which you wouldn't do in the middle of a Hebrew combining character
    sequence, anyway.

    > And I don't particular want to think about what happens when people start
    > sticking this thing into sequences other than Biblical Hebrew ("in
    > unicode, any sequence is legal").

    But don't forget that these cc=0 zero width format controls already
    can be stuck into sequences other than Biblical Hebrew. In some
    instances they have defined semantics there (as for Arabic and
    Indic scripts), but in all cases they would *already* have the
    effect of interrupting canonical reordering of combining character
    sequences if inserted there.

    --Ken



    This archive was generated by hypermail 2.1.5 : Thu Jun 26 2003 - 17:15:03 EDT