From: Magda Danish \(Unicode\) (v-magdad@microsoft.com)
Date: Tue Mar 11 2003 - 13:28:28 EST
Please make sure to copy Vladimiriranorus@online.ru on your reply.
Thanks,
Magda
> -----Original Message-----
> From: Vladimir Ivanov [mailto:iranorus@online.ru]
> Sent: Tuesday, March 11, 2003 6:22 AM
> To: Magda Danish (Unicode)
> Subject: ZWNJ & Persian Collation
>
>
> Dear Magda,
>
> Excuse for bothering you again, but my message was rejected
> by some server
> on its way to unicode@unicode.org . May I ask you to publish
> my question
> below? Thank you, Vladimir.
>
>
>
> Sorting Persian words with a utility, based on version 3.1.1
> of tailored
> Allkeys Table http://www.unicode.org/reports/tr10/#AllKeys,
> I’ve encountered
> a problem that affects the lexicographical order of the words in a
> dictionary.
>
> To my mind, ZWNJ (zero width non-joiner) U+200C (also found
> among MS Word
> Special Characters/No-width Optional Break), was invented to prevent
> connection of Arabic letters within a word.
>
> It is used in Persian to show the morphemic boundary in
> compound words like
> خانهداری xānedāri ‘household’. The latter consists of the
> word خانه xāne
> ‘house’ + verb stem دار dār ‘hold’ + suffix ی ‘i’. It can be
> transliterated
> like xāne + ZWNJ + dāri. There are thousands words with
> similar structure in
> Persian, Dari, Tajik and neighboring languages.
>
> It is clearly seen that there are letters on both sides of
> ZWNJ within the
> word boundaries. Placing ZWNJ on an edge of the word doesn’t
> make sense in
> Persian. From this point of view ZWNJ should be treated as a special
> character rather than a delimiter.
>
> But in Allkeys Table it is placed on line #68 well before
> other popular
> delimiters: HORIZONTAL TABULATION line #192,
>
> LINE FEED line #193,
>
> CARRIAGE RETURN line #196,
>
> SPACE line #197 etc.
>
> Such an ordering gives wrong sorting results for Persian dictionaries:
> compound words like خانهداری xānedāri ‘household’ appear in
> the list before
> their components like خانه xāne ‘house’.
>
> I’ve sold this problem for myself by placing ZWNJ somewhere after
> delimiters, but what are the theoretical reasons for putting
> it before them?
> In order to get what? In what languages?
>
> Is it a Persian specific problem or a global one? Are there
> languages where
> ZWNJ marks a word boundary?
>
> By the way, the sorting algorithm built into MS Windows puts
> compound words
> with ZWNJ AFTER their simple components. So in this respect
> it acts on the
> principles different from Allkeys Table.
>
>
>
> Thank you,
>
> Vladimir Ivanov
>
>
This archive was generated by hypermail 2.1.5 : Tue Mar 11 2003 - 15:34:18 EST