Re: [hebrew] Re: Draft proposal for Unicode encoding of holam male

From: Philippe Verdy (verdy_p@wanadoo.fr)
Date: Fri Apr 09 2004 - 14:01:16 EDT

  • Next message: Mike Ayers: "RE: [hebrew] Draft proposal for Unicode encoding of holam male"

    ----- Original Message -----
    From: "Peter Constable" <petercon@microsoft.com>
    To: <hebrew@unicode.org>
    Sent: Friday, April 09, 2004 6:50 PM
    Subject: [hebrew] Re: Draft proposal for Unicode encoding of holam male

    > > >this does not make it a vowel.
    > >
    > > Only in the sense that...
    >
    > Bringing the discussion back on topic... Let me try to support Jony's
    > position contra John for a moment. To avoid terminology like consonant
    > and vowel, let me simply refer to "base" characters, meaning everything
    > but the points -- the stuff that would be included in an unpointed text.
    >
    > There is a need to produce unpointed Biblical text, in which case the
    > vav will appear in a single form, regardless of whether it corresponds
    > in pointed text to C (vav) + V (holam) or to holam male. But the same
    > base characters should be used in unpointed and pointed data. This has
    > implications relating to the two alternative solutions:
    >
    > - (in PK's proposed solution) the vav + holam and the holam male text
    > elements are both represented by a single character, VAV, or
    >
    > - (in John's alternate solution) a distinct character, holam male, must
    > have an unpointed glyph variant
    >
    > The latter would be awkward for implementers and users. Therefore, the
    > former is preferable.

    Why not encoding instead a VAV variant to be used when VAV is not a real
    consonnant but a special base which alters the meaning of the following vowel?
    i.e. <VAV,VS1,HOLAM>
    This has the additional benefit of still allowing to render it (not strictly
    correctly) as <VAV,HOLAM>, i.e. vav haluma with the central/right holam dot, if
    the special form is not supported. However I wonder what it could impact for
    collation, as variant selectors are normally ignorable... But it allows a font
    to treat <VAV,VS1> as a separate glyph id which has a distinct ligature and
    positioning pair for the following HOLAM. And it keeps the structure of Hebrew
    as a base consonnant followed by optional vowel points.

    On the opposite, a separate <HOLAM MALE> vowel codepoint would preferably
    already encode both the VAV glyph and the left-holam dot (so <HOLAM MALE,HOLAM>
    would be probably rendered as the HOLAM MALE base letter, with a HOLAM point on
    the right, i.e. with two dots above the VAV glyph.) and it would need a new
    collation rule, as well as reencoding most texts that assume the opposite
    convention where <VAV, HOLAM> was used to encode holam male and not the newer
    vav haluma.

    Other proposals based on ZWJ and ZWNJ will just complicate things. in fact, as
    holam male is the most common case, and vav haluma is rare, the legacy sequence
    <VAV, HOLAM> should preferably encode the HOLAM MALE (to avoid reencoding too
    many texts).

    But I'd like to see what can be done quite simply on Bliblic texts (which are
    wellknown, stable and easily reencodable), and what is used in modern Hebrew
    using <VAV, HOLAM> (these resources are unlimited, unknown, and nearly
    impossible to guarantee that they will be reencodable easily). For example I
    think about modern people names, tononyms and trademarks which should not need
    any reencoding, as well as most modern publications where this work is
    impossible to finish (notably the texts of newspapers and lots of cheap books
    and publications).
    If in modern pointed Hebrew, there's no real distinction between the glyphs
    shown for holam male and vav haluma, reencoding will not be necessary to render
    the text correctly, but it may affect some areas like collation. I suppose then
    that a modern Hebrew collation would probably collate a new <HOLAM MALE>
    codepoint as <VAV, HOLAM> for vav haluma, this is quite simple to do with a
    two-level collation key.



    This archive was generated by hypermail 2.1.5 : Fri Apr 09 2004 - 14:41:46 EDT