From: Philippe Verdy (verdy_p@wanadoo.fr)
Date: Fri Apr 09 2004 - 14:01:16 EDT
----- Original Message -----
From: "Peter Constable" <petercon@microsoft.com>
To: <hebrew@unicode.org>
Sent: Friday, April 09, 2004 6:50 PM
Subject: [hebrew] Re: Draft proposal for Unicode encoding of holam male
> > >this does not make it a vowel.
> >
> > Only in the sense that...
>
> Bringing the discussion back on topic... Let me try to support Jony's
> position contra John for a moment. To avoid terminology like consonant
> and vowel, let me simply refer to "base" characters, meaning everything
> but the points -- the stuff that would be included in an unpointed text.
>
> There is a need to produce unpointed Biblical text, in which case the
> vav will appear in a single form, regardless of whether it corresponds
> in pointed text to C (vav) + V (holam) or to holam male. But the same
> base characters should be used in unpointed and pointed data. This has
> implications relating to the two alternative solutions:
>
> - (in PK's proposed solution) the vav + holam and the holam male text
> elements are both represented by a single character, VAV, or
>
> - (in John's alternate solution) a distinct character, holam male, must
> have an unpointed glyph variant
>
> The latter would be awkward for implementers and users. Therefore, the
> former is preferable.
Why not encoding instead a VAV variant to be used when VAV is not a real
consonnant but a special base which alters the meaning of the following vowel?
i.e. <VAV,VS1,HOLAM>
This has the additional benefit of still allowing to render it (not strictly
correctly) as <VAV,HOLAM>, i.e. vav haluma with the central/right holam dot, if
the special form is not supported. However I wonder what it could impact for
collation, as variant selectors are normally ignorable... But it allows a font
to treat <VAV,VS1> as a separate glyph id which has a distinct ligature and
positioning pair for the following HOLAM. And it keeps the structure of Hebrew
as a base consonnant followed by optional vowel points.
On the opposite, a separate <HOLAM MALE> vowel codepoint would preferably
already encode both the VAV glyph and the left-holam dot (so <HOLAM MALE,HOLAM>
would be probably rendered as the HOLAM MALE base letter, with a HOLAM point on
the right, i.e. with two dots above the VAV glyph.) and it would need a new
collation rule, as well as reencoding most texts that assume the opposite
convention where <VAV, HOLAM> was used to encode holam male and not the newer
vav haluma.
Other proposals based on ZWJ and ZWNJ will just complicate things. in fact, as
holam male is the most common case, and vav haluma is rare, the legacy sequence
<VAV, HOLAM> should preferably encode the HOLAM MALE (to avoid reencoding too
many texts).
But I'd like to see what can be done quite simply on Bliblic texts (which are
wellknown, stable and easily reencodable), and what is used in modern Hebrew
using <VAV, HOLAM> (these resources are unlimited, unknown, and nearly
impossible to guarantee that they will be reencodable easily). For example I
think about modern people names, tononyms and trademarks which should not need
any reencoding, as well as most modern publications where this work is
impossible to finish (notably the texts of newspapers and lots of cheap books
and publications).
If in modern pointed Hebrew, there's no real distinction between the glyphs
shown for holam male and vav haluma, reencoding will not be necessary to render
the text correctly, but it may affect some areas like collation. I suppose then
that a modern Hebrew collation would probably collate a new <HOLAM MALE>
codepoint as <VAV, HOLAM> for vav haluma, this is quite simple to do with a
two-level collation key.
This archive was generated by hypermail 2.1.5 : Fri Apr 09 2004 - 14:41:46 EDT