From: Maha Hassan (maha.hassan96@yahoo.com)
Date: Sat May 10 2008 - 01:22:16 CDT
Thank you for the explanation.
correction: I meant U+064F (which existed in 1.0) and not U+0619 (which confuses me even more).
I have another question, why the introduction of U+0618..U+061A? I understand it is used in Koranic display but why the duplication of already existed marks and both have exactly the same effect on pronunciation. Koranic display can be resolved in the font level not in the encoding.
Thanks,
Maha
----- Original Message ----
From: Kenneth Whistler <kenw@sybase.com>
To: maha.hassan96@yahoo.com
Cc: unicode@unicode.org; kenw@sybase.com
Sent: Friday, May 9, 2008 5:43:08 PM
Subject: Re: Arabic Normalization chart
> Thanks for the references.
> But, why U+06C7 has no decomposition? I can enter from Arabic
> keyboard U+0648\U+0619 and get the exact glyph in U+06C7.
> How come u+0623 has a decomposition and not U+06C7?
> What the criteria?
It is an interaction of the requirements for normalization
stability with the timing of the addition of various characters
for the Arabic script.
U+06C7 was already an encoded character in Unicode as of Version 1.1,
dating back to 1993.
The "composition version" for Unicode normalization stability
is defined to be Version 3.1, dating back to 2001. See
http://www.unicode.org/reports/tr15/#Versioning
for details. Among other things that means that no character
that was either decomposed or *not* decomposed as of
Version 3.1, cannot ever have its decomposition status
changed by a later version of the standard.
Those few Arabic letters that *do* have decompositions,
such as U+0622..U+0626, were *already* decomposed as of
Version 3.1, based on U+0653..U+0655 (madda and/or hamza
above or below), which were also already encoded as
of Version 3.1.
But combining marks added *after* Version 3.1 cannot be
used in decompositions of Arabic characters encoded
*before* Version 3.1 (or indeed those added in any
version earlier than when the combining marks themselves
were added).
U+0619 ARABIC SMALL DAMMA was just added in Unicode Version 5.1,
so it cannot be used to decompose any Arabic character from
earlier versions. To do so would destabilize the normalization
of Unicode data.
See:
http://www.unicode.org/policies/stability_policy.html#Normalization
for the formal statement of this requirement for stability.
Also, it should be noted that U+0619 (and similar characters
in the range U+0610..U+0618) are really intended for honorifics and
Koranic annotation -- they are not nuqtas used as diacritics
to create new Arabic characters.
So, for example, U+0615 ARABIC SMALL HIGH TAH is an annotation
mark, as cannot be used to decompose U+0679 ARABIC LETTER TTEH
(which looks like a dotless beh with a small high tah diacritic)
or U+06BB ARABIC LETTER RNOON (which looks like a noon ghunna
with a small high tah diacritic). So even though you could
type such combinations and have them appear like those letters,
they would not be canonical equivalents, nor would applications
consider them to compare equal to each other.
I realize that this is complicated and not at all self-evident
from just using an Arabic keyboard and looking at the
Unicode charts. But the constraints are in place because
of the overriding requirement to keep Unicode normalization
stable, not only for Arabic, but for all Unicode characters.
--Ken
____________________________________________________________________________________
Be a better friend, newshound, and
know-it-all with Yahoo! Mobile. Try it now. http://mobile.yahoo.com/;_ylt=Ahu06i62sR8HDtDypao8Wcj9tAcJ
This archive was generated by hypermail 2.1.5 : Sat May 10 2008 - 01:25:38 CDT