L2/06-330 Source: Deborah Anderson Date: Friday, October 06, 2006 To: 'suliana@ftmk.upsi.edu.my' Subject: RE: Proposal to encode additional Jawi characters Dear Suliana Sulaiman, Magda Danish forwarded your email regarding encoding additional Jawi characters. I run a project at UC Berkeley that works with various user communities to ensure that their characters are included in Unicode, and I also work closely with the Unicode Technical Committee. I have reviewed your proposal with other members of the Unicode Technical Committee. I believe all the characters you are requesting are already in Unicode. This is quite fortunate for Jawi, for if they were missing and had to be proposed, you would have to wait several years to use them. Your tables for NYA and NGA do refer to the appropriate Unicode codepoints to be used: for NYA, use U+06BD ARABIC LETTER NOON WITH THREE DOTS ABOVE (see note appended below); for NGA, use U+06A0 ARABIC LETTER ARABIC LETTER NIN WITH THREE DOTS ABOVE. The appropriate codepoint for GA should be U+0762 ARABIC LETTER KEHEH WITH DOT ABOVE You requested encoding contextual forms for NYA, GA, and NGA. While it is true that the Arabic presentation block encodes isolated, medial, initial, and final positional variants, this was only because they had been in older legacy character sets that encoded the presentation forms directly. The approach of Unicode, however, is summarized under "Encoding Principles" section of 8.2 in The Unicode Standard, "Each letter receives only one Unicode character value in the basic Arabic block, no matter how many different contextual appearances it may exhibit in text" (http://www.unicode.org/versions/Unicode4.0.0/ch08.pdf). Hence, it is recommended that the characters above (06BD, 06A0, 0762) be used with implementations that can perform glyph shaping (by rendering rules), accessing the appropriate glyphs in fonts. (The specific glyphs for the various positions should be defined properly in the font.) The question of Jawi is now addressed in a Frequently Asked Question on the Unicode Consortium website, at: http://www.unicode.org/faq/middleeast.html. I hope this is helpful. Do let me know if you have additional questions. With best regards, Deborah Anderson * Note the comment in Chapter 8 of The Unicode Standard (available at http://www.unicode.org/versions/Unicode4.0.0/ch08.pdf): "Jawi: U+06BD ARABIC LETTER NOON WITH THREE DOTS ABOVE is used for Jawi, which is Malay written using the Arabic script. Malay users know the character as Jawi Nya. Contrary to what is suggested by its Unicode character name, U+06BD displays with three dots below the letter when it is in the initial or medial position. This is done to avoid confusion with U+062B ARABIC LETTER THEH, which appears in words of Arabic origin, and which has the same base letter shapes in initial or medial position, but with three dots above in all positions." Deborah Anderson Researcher, Dept. of Linguistics, UC Berkeley Proj. Leader, Script Encoding Initiative http://linguistics.berkeley.edu/sei NOTE NEW Email: dwanders@sonic.net (or dwanders@berkeley.edu) -----Original Message----- From: Suliana bt. Sulaiman [mailto:suliana@ftmk.upsi.edu.my] Sent: Wednesday, October 04, 2006 1:34 AM To: magda@unicode.org Subject: Proposal to encode additional Jawi characters Dear Magda Danish, Here I attach you some file to review and hopefully Unicode Consortium will accept it. Regards, Suliana Sulaiman .