From: Mark E. Shoulson (mark@kli.org)
Date: Sun Nov 20 2005 - 18:47:07 CST
Cary Karp wrote:
> There's more to the use of Hebrew script in IDN than GERESH or
> GERSHAYIM :-)
>
> With specific regard to Yiddish--
>
> The Yiddish digraphs 'tsvey vovn', 'vov yud', and 'tsvey yudn', can be
> entered in two different ways from a Hebrew keyboard. If there are
> single keys for each of them, it is likely that they will produce the
> ligatures HEBREW LIGATURE YIDDISH DOUBLE VAV (U+05F0), HEBREW LIGATURE
> YIDDISH VAV YOD (U+05F1), and HEBREW LIGATURE YIDDISH DOUBLE YOD
> (U+05F2). Even when this option is available, some users may enter
> them as two key combinations, giving HEBREW LETTER VAV - HEBREW LETTER
> VAV (U+05D5 U+05D5), HEBREW LETTER VAV - HEBREW LETTER YOD (U+05D5
> U+05D9), and HEBREW LETTER YOD - HEBREW LETTER YOD (U+05D9 U+05D9). It
> is not apparent that the one form is used preferentially to the other,
> and no attempt at normalizing them has yet been made.
>
> However, in an application such as IDN where a string entered from a
> keyboard needs to be matched exactly with a stored string, and the
> keyboarded string may be represented in different ways, the
> application will obviously need to accommodate all alternative input
> forms. If the registry also contains the corresponding multiple
> representations, the intended result at the user end will be ensured.
I started to write an answer, and now I'm pretty sure what I was going
to say was wrong. I may be missing something, but it looks like these
distinctions aren't being erased (as they should be) by the
normalization process! I would have thought that would be a no-brainer.
I'd venture to say that double-vav, vav-yod, and yod-yod ligatures
should have *canonical* decomposition to their constituent letters! I'm
sure that would cause problems of some sort, but at least compatibility
decomposition is necessary.
> There are also good reasons for preferring the stored form to be
> unique. At least on first consideration, it would seem to make sense
> for the canonical form to be the one most frequently encountered in
> keyboarding practice. Does anyone on this list know if these three
> digraphs are more frequently entered as single characters, or as two
> characters combinations? What would the likely behavior be if it were
> not clear to the user whether the string to be entered was in Yiddish
> or in Hebrew?
Doesn't really matter which is the more frequently entered; we normalize
strings all the time in Unicode.
~mark
This archive was generated by hypermail 2.1.5 : Sun Nov 20 2005 - 18:48:05 CST