disunifying Dagesh and Shuruk

From: Arno Schmitt (arno@zedat.fu-berlin.de)
Date: Mon Jul 12 1999 - 01:52:56 EDT


There are some mistakes in the Hebrew encoding or/and the rules on
pages 6-20f.
The rule "There is never more than one vowel [mark] for a base
character ..."
either has to go or be rephrased to: "A base character (consonant)
has normally only one vowel mark." I have consulted Jonathan
Rosenne, and he agrees. The most common example of a letter having
two vowel marks is the lamed in Bibical "Yerushalaim"; it has both
the "a" and the "i": qamac and hiriq (and between them meteq)

More controversial could be my second point:
u05BC stands for Dagesh, Mappik and Shuruk
Dagesh and Mappik are no problem, since they look alike and --
more importantly -- can not be confused: Mappik occurs only in
(after) heh, and Dagesh _never_ occurs in heh.
Shuruk only occurs in/with/after waw, but waw can also have a
dagesh.
Some typographers and grammarians (G. Bergsträsser) say shuruk
should stand higher than dagesh.
Dagesh and shuruk should be disunified, because they have
different
funntions and occur with the same letter -- not like the accent
mark that has different functions in _different_ languages,
not like umlaut and diaeresis that in German are used for
different vowels: the one for a, u, o only, and the other for i
and e only.

They have different names in Hebrew grammar and -- more
importantly -- their being unified turns a Unicode rule into a
false one:
On page 6-20 we read "Dagesh [i.e. u05BC] is not a vowel [mark]
... The same base consonant can also have a vowel [mark] ..." A
waw that has a shuruk can _not_ have vowel mark, because
_in_that_case_ u05BC is itself a vowel mark.

Smart software can deal with the present situation (for most
texts), but Unicode should at least disclose the rule:
"If shuruk and dagesh have only one code point, damaruk (i.e.
DAgesh/MAppik/shuRUK) in a waw represents a dagesh, when the waw
has (another) vowel mark or is followed by a holam gadol (waw with
holam) or shurek (waw with shuruk). In all other cases -- and in
Yiddish -- damaruk after waw is presumed to be a shuruk." I say
"is presumed" because in a _partialy_ "pointed" text exceptions
are possible, but in real life dageshes are amost only used in
fully "pointed" texts.
So far, if we do _not_ add a new Hebrew Point "shuruk".

If we get the new Hebrew Point, I propose a rule that takes
reality into consideratian (reality being existing text encoded
with unified damaruk, people that can't bother, and software that
does not store the difference:
"In texts without a separate shuruk, dagesh can represent shuruk
as
well. In these texts the dagesh point in a waw represents a (real)
dagesh, when the waw has (another) vowel mark or is followed by a
holam gadol (i.e. waw with holam) or shurek (i.e. waw with
shuruk). In all other cases -- and in Yiddish -- dagesh after waw
is presumed to be a shuruk."

My last point is of no operational importance: On page 6-20 the
Hebrew combining marks are grouped into four categories:
dagesh,
shin- and sin-dot,
vowel points, and
diacritics (i.e metag, rafe, varika).
Since rafe is nothing but an anti-dagesh (signaling: this letter
has
NO dagesh), rafe should either be grouped with dagesh, or dagesh
should be grouped with the diacritics.

Arno Schmitt



This archive was generated by hypermail 2.1.2 : Tue Jul 10 2001 - 17:20:48 EDT