RE: 5 Hebrew Consonants Shaping

From: Jonathan Rosenne (rosenne@qsm.co.il)
Date: Thu May 27 1999 - 06:50:04 EDT


Dear Arno,

+AD4- Jonathan Rosenne's +ACI-argument+ACI- is tradition, not reason, so IMHO no
+AD4- argument at all:

A script is a tradition for writing the language. All your examples are German traditions. Just because you are familiar with them doesn't mean they are better, more justified or more reasonable. I don't need to convince you or anyone else that our traditions are good, reasonable or justified, I only have to convince you all that they actually are as I say they are. Please let us see whether there are out there real users of the Hebrew script that use only the nominal form plus the invisible marks (let us ignore the Yevsektsia RIP).

Unicode is not about finding better ways to write various languages, it is about including them all in a single code.

The normalization of Hebrew and Arabic is defined in the data base and the TR. In Arabic the shaped forms are resolved to the nominal forms, in Hebrew not so.

+AD4- The Hebrew point Dagesh (+u04-) transforms a fe into a pe, and
+AD4- since there is no final form of pe (only for FE -- the Unicode
+AD4- name for +u04- +ACI-final pe+ACI- is not correct), PE plus Dagesh should
+AD4- be treated as PE plus ZWNJ +AD0APg- no final form even at the end of a
+AD4- word.

Wrong on several counts. There is no Hebrew letter fe, and the Dagesh does not transform the letter Pe to any other letter. It only specifies its pronunciation, stress and +ACI-weight+ACI-. The Unicode name is correct - it is the final form of the Hebrew letter Pe. The Dagesh is not required, and whether the final form is to be used or not cannot be deduced solely from the presence or absence of the Dagesh. And the Dagesh +ACI-logic+ACI- cannot be applied to other final letters.

Jony

+AD4- -----Original Message-----
+AD4- From: Arno Schmitt +AFs-mailto:arno+AEA-zedat.fu-berlin.de+AF0-
+AD4- Sent: Thursday, May 27, 1999 11:02 AM
+AD4- To: Unicode List
+AD4- Subject: 5 Hebrew Consonants Shaping
+AD4-
+AD4-
+AD4- there is NO GOOD argument for having different keys for the
+AD4- final shapes of some Hebrew letters.
+AD4-
+AD4- Jonathan Rosenne's +ACI-argument+ACI- is tradition, not reason, so IMHO no
+AD4- argument at all:
+AD4- +AD4-
+AD4- +AD4- In Hebrew, this is the way we do it since print was invented in the
+AD4- +AD4- 16th century. This is the way it was implemented in typewriters, in
+AD4- +AD4- unit record equipment and in computers.
+AD4- +AD4-
+AD4-
+AD4- I want to take it a step further -- and thus bringing it to the
+AD4- area of Unicode proper:
+AD4- There is no strong good argument for having different codepoints
+AD4- for this final letters. ZWJ and ZWNJ are there precisely for the
+AD4- few exceptions.
+AD4- The only examples given so far (shlep and Philip) are no good
+AD4- argument:
+AD4- The Hebrew point Dagesh (+u04-) transforms a fe into a pe, and
+AD4- since there is no final form of pe (only for FE -- the Unicode
+AD4- name for +u04- +ACI-final pe+ACI- is not correct), PE plus Dagesh should
+AD4- be treated as PE plus ZWNJ +AD0APg- no final form even at the end of a
+AD4- word.
+AD4-
+AD4- In German +ACI-Gut+ACI- (estate, merchandise) and +ACI-gut+ACI- (fine) are
+AD4- different word -- similar to French +ACI-mere+ACI- and +ACI-m+AOg-re+ACI-, +ACI-conte+ACI- and
+AD4- +ACI-compte+ACI- --,
+AD4- +ACI-Genossen+ACI- (comrades) and +ACI-genossen+ACI- (enjoyed) are not the same
+AD4- word a all. If you consider words like +ACI-Liebe+ACI- (the love) and
+AD4- +ACI-liebe (Freunde)+ACI- (good friends) there are innumerable pairs. And
+AD4- different word should be treated differently in most contexts.
+AD4- But tsarich (needing, singular) and tsrichim (needing, plural),
+AD4- tsorech (need) and tsrachim (needs) should be treated in many
+AD4- contexts as the same word ((the Hebrew spelling for them is the
+AD4- same, just the regular masculine plural ending is adds to the base
+AD4- word)). Although two of these words are written with the final
+AD4- shape of KAF ( 05DA), and the other with the canonical form (
+AD4- 05DB), they are the same.
+AD4-
+AD4- Does this answer Jonys question:
+AD4- +AD4- When do you need such a comparison and for what purpose?
+AD4- to
+AD4- Mark Leisher wrote:
+AD4- +AD4APg-A common processing example: if we need a comparison or search
+AD4- routine that
+AD4- +AD4APg-treats nominal and contextual forms the same, I don't ask a coder to add
+AD4- +AD4APg-special rules to handle the special cases. I just tell them to ignore
+AD4- control
+AD4- +AD4APg-characters in their algorithm. Opportunities to introduce bugs just got
+AD4- +AD4APg-smaller.
+AD4-



This archive was generated by hypermail 2.1.2 : Tue Jul 10 2001 - 17:20:46 EDT