Re: Kashmiri in the Person-Arabic script

From: Dheeraj Kumar (dheerajkumar_97@yahoo.com)
Date: Fri Nov 03 2006 - 02:10:54 CST

  • Next message: James Kass: "Re: Kashmiri in the Person-Arabic script"

    Thanks William, I think a few Kashmiri characters are missing and so are a few Hazargi ones too. Given the diversity of languages and scripts in India, I humbly request that the Unicode Consortium be fast in incorporating other minor Indian languages into the standard lest non-standard approaches become pre-dominant. Manipuri is another language which may have characters requiring standardization. Khasi and Garo also come to my mind as I write this. Best regards, Dheeraj ========================= ----- Original Message ---- From: William J Poser <wjposer@ldc.upenn.edu> To: dheerajkumar_97@yahoo.com Sent: Thursday, November 2, 2006 7:54:10 AM Subject: Kashmiri in the Person-Arabic script Assuming that you know Kashmiri, get a copy of the Unicode 5.0 standard or go to the web site and obtain the code charts for Arabic: http://www.unicode.org/charts/PDF/U0600.pdf Arabic http://www.unicode.org/charts/PDF/U0750.pdf Arabic Supplement http://www.unicode.org/charts/PDF/UFB50.pdf Arabic Presentation Forms A http://www.unicode.org/charts/PDF/UFE70.pdf Arabic Presentation Forms B To the extent that the Unicode names are sufficient for identifying characters, which they very likely won't be, you can also work from the NamesList: http://www.unicode.org/Public/UNIDATA/NamesList.txt Then go through and see if you can find all of the characters needed for Kashmiri. The existing standard does include several characters exclusively for Kashmiri, e.g. U+06C4 ARABIC LETTER WAW WITH RING, or for Kashmiri and one or two other languages, e.g. U+0673 ARABIC LETTER ALEF WITH WAVY HAMZA BELOW, listed as for Kashmiri and Baluchi, but it is possible that there are still omissions. My somewhat superficial comparison of the Indian PASCII standard with Unicode left me with the impression that Kashmiri and/or Sindhi may require additions. You also need to have some understanding of what what counts as a character. If, for example, a certain ligature does not appear in Unicode, it may be that Unicode considers it to be a rendering variant of a sequence of two characters and has decided not to encode it separately. Bill



    This archive was generated by hypermail 2.1.5 : Fri Nov 03 2006 - 02:13:30 CST