From: Roozbeh Pournader (roozbeh@htpassport.com)
Date: Fri Oct 01 2010 - 16:55:04 CDT
This is a rather late reply, but I think this document should be useful:
http://www.evertype.com/standards/af/af-locales.pdf
The first few pages discuss and recommend various Yeh forms to be used,
and a recommendation for avoiding some in certain forms.
Roozbeh
On Thu, 2010-07-22 at 12:17 -0500, linguist@artstein.org wrote:
> Hi,
>
> This is a query I had originally sent to the Linguist List, modified
> based on feedback I got there. I am hoping that someone in the Unicode
> community can help resolve this.
>
> I'm interested in knowing if there is a standard way to encode the
> various Pashto yeh-characters in Unicode, and if so, what it is. This
> question is a bit more complicated than it sounds, so here's the
> background.
>
> Pashto is written using a derivative of the Arabic script. The Arabic
> language uses a single character for both /j/ and /i:/ sounds. Like
> many Arabic characters, this one is composed of a base form (which
> changes shape based on its position in a word) and dots (in this case,
> two dots below the base form). In most of the Arabic-speaking world
> the dots are present with both the medial and final form, though in
> Egypt (and possibly other places) the convention is to have two dots
> on the medial form but leave them off the final form. The standard
> arrangement of the two dots is horizontal, but they can be placed
> vertically or diagonally with no change in meaning.
>
> Persian also uses a single character for /j/ and /i:/, with the
> convention of two dots on the medial form, no dots on the final form
> (same as in Egypt).
>
> The two conventions for the /j/-/i:/ character were given distinct
> code points in unicode despite the fact that they do not contrast;
> documentation is scarce, but presumably this was done in order to
> allow writing both Arabic and Persian in the same document. Therefore,
> Unicode has the following code points (I'm not giving the names, but
> rather the typical visual representation of the glyphs and typical use).
>
> U+064A two dots medially and finally (/j/-/i:/ Arabic convention)
> U+06CC two dots medially, none finally (/j/-/i:/ Persian convention)
>
> There are a few additional yeh-base code points defined, some of which
> are relevant to Pashto (see below).
>
> U+0649 no dots medially or finally (Arabic /a/ from etymological /j/)
> U+0626 hamza above medially and finally (Arabic glottal stop in
> certain contexts)
> U+06D0 two dots medially and finally in vertical arrangement
> U+06CD tail and no dots in final position
>
> As it so happens, there is much confusion in how these characters are
> used in actual electronic documents, which is not surprising given
> that U+06CC looks like U+064A in medial position but like U+0649 in
> final position. There is an excellent article by Jonathan Kew that
> sorts out what this means for various languages that use derivatives
> of the Arabic script.
>
> http://scripts.sil.org/cms/scripts/render_download.php?site_id=nrsi&format=file&media_id=arabicletterusagenotes&filename=ArabicLetterUsageNotes.pdfÌ
>
> Unfortunately, this article does not discuss Pashto. I have little
> knowledge of the language, but here's what I managed to understand
> from the inspection of a few documents and with the help of friendly
> people on the Linguist List (and please correct me if I'm wrong).
>
> Traditionally, Pashto used a single character with the same convention
> as in Persian, of two dots in the medial form and none on the final
> form, and with no significance attached to the visual arrangement of
> the dots. The character was 3-ways ambiguous between the sounds /j/,
> /i:/ and /e/. In recent decades (probably since the 1970s or 1980s)
> there has been some differentiation, partly due to changes in the
> typesetting process and partly due to a deliberate effort of the
> Pashto Academy at the University of Peshawar, Pakistan.
>
> One convention that has gained fairly wide acceptance is a distinction
> between a horizontal arrangement of the dots, representing /j/ or /i:/
> as in Arabic and Persian, and a vertical arrangement representing the
> sound /e/. This distinction is the same as in Uighur, and the
> character with vertical dots has been codified as U+06D0. Additional
> conventions include a hamza (U+0626) or tail (U+06CD) to represent /j/
> at the end of a word in certain grammatical markers. All of these are
> quite standard by now and do not pose much of a problem.
>
> However, a further convention appears to have arisen, which as far as
> I can tell is unique to Pashto in that it distinguishes between /j/
> and /i:/ (though only in word-final position):
>
> /j/ is written with two dots medially, none finally
> /i:/ is written with two dots both medially and finally
>
> I have never seen this codified explicitly, but this is the impression
> I get from examining a few recent Pashto documents. Which brings me to
> my original question, of how to represent these characters in Unicode.
> The linguist in me notices a correspondence between sounds and Unicode
> code points (which, given the history I have just described, is most
> certainly accidental):
>
> /j/ corresponds to U+06CC
> /i:/ corresponds to U+064A
>
> The wikipedia article on the Pashto alphabet
> http://en.wikipedia.org/wiki/Pashto_alphabet gives a different
> correspondence, based on visual appearance:
>
> forms with dots: U+064A (/i:/ and /j/ medially, /i:/ finally)
> forms without dots: U+0649 (only /j/ in word-final position)
>
> And there is yet a third convention, which I encountered in an
> electronic lexicon and also appears in the following document:
> http://www.afghanan.net/pashto/pashto%20alifba.pdf
>
> U+06CC: medial forms with dots (/i:/ and /j/) and dotless final form (/j/)
> U+064A: final form with dots (/i:/)
>
> To wrap up, are my observations about the Pashto writing conventions
> correct? And is there a standard for assigning the Pashto characters
> representing /j/ and /i:/ to Unicode code points?
>
> -Ron.
>
>
This archive was generated by hypermail 2.1.5 : Fri Oct 01 2010 - 16:58:23 CDT