Re: Pashto yeh characters

From: linguist@artstein.org
Date: Wed Jul 28 2010 - 11:12:46 CDT

Next message: André Szabolcs Szelp: "Re: High dot/dot above punctuation?"

Previous message: Kent Karlsson: "Re: High dot/dot above punctuation?"
In reply to: Andreas Prilop: "Re: Pashto yeh characters"
Next in thread: Andreas Prilop: "Re: Pashto yeh characters"
Reply: Andreas Prilop: "Re: Pashto yeh characters"
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ] [ attachment ]
Mail actions: [ respond to this message ] [ mail a new topic ]

Quoting Andreas Prilop <prilop4321@trashmail.net>:

Hi Andreas,

Thanks for the references to the old 7-bit and 8-bit Arabic character sets.

> http://www.itscj.ipsj.or.jp/ISO-IR/089.pdf
> http://www.itscj.ipsj.or.jp/ISO-IR/127.pdf

I think these clearly show that alef maksura was the intention behind
the dotless code point immediately preceding yeh, which later got
incorporated into Unicode as U+0649.

In terms of practice, Arabic-language documents are fairly consistent
about using U+064A for yeh and U+0649 for alef maksura -- except in
Egypt, which has a tradition of not distinguishing between alef
maksura and yeh in final position (both are written without dots).
Here's an arbitrary page from today's Al-Ahram newspaper, where both
yeh and alef maksura are encoded as U+064A (the same holds for other
pages of the site).

http://www.ahram.org.eg/241/2010/07/28/25/31443.aspx

On my computer this looks particularly jarring, because two dots are
displayed on alef maksura in words like 'ila "to" and `ala "on". My
locale is set to en_US, I wonder if an Egyptian locale setting would
cause U+064A to display without dots.

Going back to my original question about Pashto, unfortunately I
cannot use the advice you gave in your initial reply, "Use whatever
you want." I am not creating Pashto documents for print or electronic
distribution, but rather working on automated language-processing
tasks. It seems that the only workable solution would be to unify all
U+064A and U+06CC characters found in Pashto documents into a single
character for processing (and also U+0649 if we encounter it). It is
unfortunate that a distinction between the characters cannot be used
for disambiguating unvocalized Pashto text, but this appears to be the
current state of affairs.

-Ron.

Next message: André Szabolcs Szelp: "Re: High dot/dot above punctuation?"
Previous message: Kent Karlsson: "Re: High dot/dot above punctuation?"
In reply to: Andreas Prilop: "Re: Pashto yeh characters"
Next in thread: Andreas Prilop: "Re: Pashto yeh characters"
Reply: Andreas Prilop: "Re: Pashto yeh characters"
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ] [ attachment ]
Mail actions: [ respond to this message ] [ mail a new topic ]

This archive was generated by hypermail 2.1.5 : Wed Jul 28 2010 - 11:25:11 CDT