document fa_AF=prs & merged fa_AF=ps collation tailoring

Reported by: markus
common/collation/fa_AF.xml has <alias source="ps" path="//ldml/collations"/>, which means that Persian-in-Afghanistan supposedly has the same sort order as Pashto which is very different from the Persian tailoring.

This seems strange because Persian (fa) and Pashto (ps) are significantly different languages, see http://en.wikipedia.org/wiki/Pashto_language

The main locale data is not aliased: common/main/fa_AF.xml overrides some of fa.xml and inherits other parts, but does not have any aliases.

I suggest we have someone familiar with languages of that area review this, and consider removing the common/collation/fa_AF.xml file.


comment:1 Changed 5 years ago by ake.persson@…

This resource might be useful, http://www.evertype.com/standards/af/

comment:2 Changed 5 years ago by roozbeh

As far as I remember, me and Michael Everson tried to come up with a unified collation for languages of Afghanistan when we were writing the UNDP report. This is especially useful because Afghan names may be written in any of the languages. Since the Pashto alphabet is a superset of the Persian alphabet and keeps the order of characters, the unification was very easy and straight-forward. So, yes, I would expect Pashto and Afghan Persian collation to be the same.

comment:3 Changed 5 years ago by markus

The Wikipedia article says "Pashto is one of the two official languages of Afghanistan (the other being Dari Persian)", and that article says that Dari Persian has the language code prs not fa_AF.

If this is correct, then we should remove collation/fa_AF.xml and move its contents (basically an alias to collation/ps.xml) into a new file collation/prs.xml.

In any case, wherever we have this alias relationship, please add a comment in both the aliasing and the aliased files with basically Roozbeh's information.

comment:4 Changed 5 years ago by roozbeh

For normal locale data, we shouldn't move "fa_AF" to "prs". In the ISO 639-3 model, "fa" is a macrolanguage, like Chinese and Arabic, and it has two sublanguages, "pes" for Iranian Persian and "prs" for Afghan Persian. Also, the differences in written "pes" and "prs" are minimal (and are limited to things like country and language names), which makes any Iranian Persian localization usable for Dari. The current model of having "fa_AF" and "fa" works perfectly for that. But I have no object to aliasing the other way around: "prs" to "fa_AF" and "pes" to "fa_IR".

I would like to be able to keep the same default locale names for collation too. So Dari wouldn't become fa_AF for normal locale data and prs for collation.

(BTW, I have provided most of the locale information for "fa", "fa_AF", and "ps" myself, the later two from two field trips to Kabul and months of research.)

comment:5 Changed 5 years ago by markus

Roozbeh: That all sounds reasonable to me. Please document fa_AF=prs in the main/fa_AF and collation/fa_AF files, and add comments in both collation/fa_AF files and collation/ps about the merged tailoring you created, as you described earlier. Only if it's documented will we avoid further such questions from people who know nothing about these languages (like me).

I actually don't know how we add comments in main/* files since they get generated from the Survey Tool (right?). But I believe that collation/* files are manually maintained.

comment:6 Changed 5 years ago by mark

Can have comments in files, and they are maintained.

comment:7 Changed 4 years ago by markus

TODO: comment 5

Note: The tailorings were changed from a forbidden alias to an import; the documentation should still be added.

comment:9 Changed 4 years ago by markus

Comments in main/* need to be added either before the Survey Tool opens or after data has been imported from it. Otherwise they get clobbered.

comment:10 Changed 4 years ago by markus

Note: Comments to be added in both main/ and collation/, see the list of comments here.

comment:16 Changed 2 years ago by roozbeh

comment:18 Changed 2 years ago by roozbeh

