Re: Yiddish digraphs

From: Mark E. Shoulson <mark_at_kli.org>
Date: Wed, 19 Oct 2011 15:08:12 -0400

On 10/19/2011 01:40 PM, Andreas Prilop wrote:
> There are three so-called "Yiddish digraphs" in Unicode:
> U+05F0 wawayim
> U+05F1 waw yod
> U+05F2 yodayim
>
> What is specifically Yiddish about these digraphs?
> They can be used in the same way in Hebrew.
> But this isn't done. Why not?
>
> http://he.wikipedia.org/wiki/%F8%E9%E9_%F7%E5%F8%F6%E5%E5%E9%E9%EC
> http://he.wikipedia.org/wiki/%F8%D6_%F7%E5%F8%F6%D4%D6%EC
>
> Why should Yiddish be written with special digraphs
> but Hebrew with sequences of two letters?
>
> But even in Yiddish, the digraphs are not really used:
>
> http://yi.wikipedia.org/wiki/%F8%F2%F7%E9%E0%E5%E5%E9%F7
> http://yi.wikipedia.org/wiki/%F8%F2%F7%E9%E0%D4%E9%F7
>
>
> The Unicode Standard says:
> | ... to distinguish the digraph double vav from an occurrence
> | of a consonantal vav followed by a vocalic vav.
>
> By that reasoning you would need an English digraph "sh"
> to distinguish "sh" in "shit" from "s-h" in ***hole. ;-)

I think the issue here is (probably) a matter of legacy encodings,
though someone else would need to confirm that. It is true that in
Yiddish the double-vav, vav-yod, and double-yod digraphs are considered
separate letters, but the same is true of Welsh "ch", which we know does
not get its own code-point. Similarly, U+FB2E HEBREW LETTER ALEF WITH
PATAH is just the same thing as an ordinary ALEF with a PATAH
vowel-point, and indeed has just that as its canonical decomposition, so
even Unicode considers the two codings to be identical (right? or mostly
identical at least), and the same for much of the rest of the Hebrew
Alphabetic Presentation block, U+FB1D - U+FB4F. Modern Hebrew likely
borrowed the special use (in unpointed text) of double-vav and
double-yod from Yiddish, but they are not normally considered separate
letters in Hebrew.

The only reason I can think of for these characters having their own
code-points is the same reason that U+00E1 LATIN LETTER SMALL A WITH
ACUTE has its own code-point, despite being just an "a" with a combining
acute, or that U+0149 LATIN SMALL LETTER N PRECEDED BY APOSTROPHE has
its own, or that U+0064 LATIN SMALL LETTER DZ has its own: presumably,
there was some earlier encoding that had it and was required for
round-tripping (interesting that the Latin examples have *compatibility*
decompositions, and the Hebrew/Yiddish digraphs don't even have that).

The case of U+FB1F HEBREW LIGATURE YIDDISH YOD YOD PATAH, which you did
not mention, is a different situation, in that the patah is written
under *both* yods, so it can't truly be said to decompose into ordinary
Hebrew letters.

If there wasn't an earlier standard, I don't really have a good answer
that isn't contradicted by other examples. I thought it was in Latin-8,
but I don't see it when I look it up.

~mark
Received on Wed Oct 19 2011 - 14:11:02 CDT

This archive was generated by hypermail 2.2.0 : Wed Oct 19 2011 - 14:11:02 CDT