Help with Hebrew

Eric Muller emuller at
Sat Jul 26 16:11:40 CDT 2014

Many thanks for all the answers on my Hebrew and Arabic questions.

On 7/6/2014 4:18 AM, Matitiahu Allouche wrote:
> The original text is interesting, combining French, Latin and Hebrew.

There is also a fair amount of Greek, and a couple of Arabic words.

> Unfortunately, the author and/or the type setter were not quite proficient in Hebrew, so that the Hebrew words in the 3 referenced pages contain quite a few errors.

I think it's a safe assumption that the typesetter was not necessarily 
fluent in Hebrew.

> I am not sure if the digitization should reproduce faithfully the flaws of the original document, or if it is an opportunity to correct the errors (which may not be possible for the first page).

I want both! In my XML source, I do record things like "<correction 
original="mistaque">mistake</correction>", and render that in the EPUBs 
I produce by "mistake [mistaque]".

> 1) Eric's representation of the Hebrew words in f274.image seems correct. So the Unicode sequences are
>     Yod (U+05D9) Segol (U+05B6) Dalet (U+05D3) Segol (U+05B6) Alef (U+05D0)
> And
>     Yod (U+05D9) Segol (U+05B6) Dalet (U+05D3) Segol (U+05B6) He (U+05D4)
> However, the Hebrew words are suspect:
> a. The first one (Yod Dalet Alef) is not a stem known in Hebrew. It could be a deformation of the stem Yod Resh Alef whose meaning is to fear (= the French "craindre").

I would not be surprised if the typesetter confused dalet and resh. The 
good news is that the text I pointed to is one of the many re-editions 
of the work, and we have a facsimile of the original edition:

Here is seems clear that it's a resh in both examples. By the way, the 
whole sentence reads roughly "In Hebrew, there are words which are 
different only in that one ends with an aleph, and the other with a he, 
which are not pronounced, as <> which means fear and <> which means 
throw away." This follows a discussion that in French, "champ" and 
"chant" are pronounced the same, with the final p and t silent.

> b. Both grammatical forms (with Segol under the rightmost two letters in both words) do not conform to proper conjugation, as far as I know (conjugation of Hebrew verbs is not a matter for the faint of heart).

The original edition seem to show a qamats. Would that be better?

> 2) The case of f299.image is yet more complicated:

The original edition:

> a. If you compare the rightmost letter in the Hebrew word following "mais dans" with the corresponding letter in the Hebrew word following "pour", you can see that they don't look identical. The first one has a rounded top-right corner while the second one has a more square shape. The first letter looks like a Hebrew letter Resh (U+05E8) and the second looks like a Hebrew letter Dalet (U+05D3, and it is the correct one).

The original seems to show Dalet in all three cases. Overall, what I see 
there is

- dalet, sheva, bet, patah, resh, space, shin, segol, qof, segol, resh
- shin, segol, qof, segol, resh
- dalet, sheva, bet, patah, resh
- dalet, qamats, bet, qamats, resh

The text is telling how the genitive is marked differently in Latin and 
in Hebrew. In Latin, in verbum falsitatis, it's falsitas that has been 
transformed into falsitatis to mark the genetive, while in Hebrew, it's 
(the word for verbum) that is modified.

> c. When a word starts with Dalet, there should generally be a Dagesh in the Dalet.

That brings an interesting question. If you look at the French in the 
two editions (1660 and 1810), you will see that they different 
orthographies, and that today's orthography (2014) is yet another one. 
There is no reason this would not happen in the same way for the Hebrew. 
So what I am really after is
- what's on the page
- what was meant to be on the page, when the editions were made (1660, 
- what one would want to put on the page if one were to make a modern 
edition, with modern orthography throughout

Is it plausible that the dagesh would only be in the last case (modern 
orthography), since it's clearly absent in both facsimiles?
> d. The point on the Shin (rightmost letter of the second word) is a Sin Dot, while it should be a Shin Dot,

None in the original edition, apparently.
> The expression was probably quoted from Exodus XXIII, 7, where the vowel under the Bet is a Patah, which is also the way it would be written in modern Hebrew.
> So the right sequences (after correcting the errors in the original document) are
>     - Dalet (U+05D3) Dagesh (U+05BC) Sheva (U+05B0) Bet (U+05D1) Patah (U+05B7) Resh (U+05E8) Space Shin (U+05E9) Shin Dot (U+05C1) Qamats (U+05B8) Qof (U+05E7) Segol (U+05B8) Resh (U+05E8)
>     - Shin (U+05E9) Shin Dot (U+05C1) Qamats (U+05B8) Qof (U+05E7) Segol (U+05B8) Resh (U+05E8)
>     - Dalet (U+05D3) Dagesh (U+05BC) Sheva (U+05B0) Bet (U+05D1) Patah (U+05B7) Resh (U+05E8)
>     - Dalet (U+05D3) Dagesh (U+05BC) Qamats (U+05B8) Bet (U+05D1) Qamats (U+05B8) Resh (U+05E8)

That matches the original edition, except for the dagesh and shin dot 
which seems absent. Some quamats tend to look like segol, but I suspect 
that can be attributed to poor printing.

> 3) The case of f310.image is also problematic.

The original edition:

> a. The feminine pronoun is written with 2 errors (not bad for a word with only 2 consonants):  firstly, the vowel under the first (rightmost) letter Alef (U+05D0) is missing and should be a Patah. Secondly, there should be a Dagesh (U+05BC) in the leftmost letter (Tav U+05EA). So the proper sequence is Alef(U+05D0) Patah (U+05B7) Tav (U+05EA) Dagesh (U+05BC).

The original edition looks like: alef, qamats, tav, sheva.
> b. The masculine pronoun is also written with 2 errors:  firstly, there should be a Dagesh (U+05BC) in the middle letter (Tav U+05EA). Secondly, the leftmost letter must be a He (U+05D4) and not an Alef as appearing in the original document.
> So the proper sequence is Alef (U+05D0) Patah (U+05B7) Tav (U+05EA) Dagesh (U+05BC) Qamats (U+05B8) He (U+05D4).

The original edition looks likes alef, qamats, tav, qamats, alef.

Philippe Verdy wrote:

> Note: the three comma-separated items, if they are just separated by 
> the comma (in that example it is handwritten, but it is the European 
> comma, not the arabic comma) should use bidi-embedding controls

That actually looks like the perfect job for three isolates.

Thanks again to everyone,

More information about the Unicode mailing list