Re: FW: Persian alphabet

From: Roozbeh Pournader (roozbeh@sharif.edu)
Date: Fri Mar 16 2001 - 19:47:46 EST


On Fri, 16 Mar 2001, Michael (michka) Kaplan wrote:

> In what way is the MS sort for the Farsi locale inadequate? What locale
> needs are not served, precisely?

The short answer is that I don't know. A colleague have tested that, and
after that he found that there are some issues and they cannot get fixed
because they're hard-wired, we forgot about fixing it. We better agree on
an ordering and standardize it first, we thought.

If you have a list, I can comment on its problems. But if you want it
short and precise, this is the current practice:

<Alef With Madda>, <Alef>, <Hamza, Alef With Hamza Above, Waw With Hamza
Above, Yeh with Hamza Above>, <Beh>, <Peh>, <Teh>, <Theh>, <Jeem>,
<Tcheh>, <Hah>, <Khah>, <Dal>, <Dhal>, <Reh>, <Zain>, <Jeh>, <Seen>,
<Sheen>, <Sad>, <Dad>, <Tah>, <Zah>, <Ain>, <Ghain>, <Feh>, <Qaf>,
<Keheh>, <Gaf>, <Lam>, <Meem>, <Noon>, <Waw>, <Heh, Teh Marbuta>, <Farsi
Yeh, Yeh>.

The ordering of vowel marks is also different, being Fatha, Kasra, Damma,
Fathatan, Kasratan, Dammatan, Shadda, Sukun.

When sorting Arabic words together with Persian, Alef With Hamza Below,
Kaf, Alef Maksura, and Alef Wasla may be encountered also. Kad and Alef
Maksura should be sorted with Keheh and Farsi Yeh, but I don't know about
the other two.

Also, if you are doing human-aided sorting, you should be aware of
something: If Alef With Madda is used in the middle of the word, and the
word is of Arabic origin, it should be considered as "Hamza+Alef".
Examples are words like "ghor'aan" (Qaf, Reh, Alef With Madda Above, Noon)
and "ma'aakhez" (Meem, Alef With Madda Above, Khah, Dhal) which are Arabic
words. Counter-examples are words like "raah-aab" (Reh, Alef, Heh, ZWNJ,
Alef With Madda Above, Beh).

--roozbeh



This archive was generated by hypermail 2.1.2 : Tue Jul 10 2001 - 17:21:20 EDT