From: Kenneth Whistler (kenw@sybase.com)
Date: Fri May 09 2008 - 16:45:54 CDT
> I am trying to understand the normalization chart for Arabic.
> Why there are certain glyphs are not decomposed entirely under KD, for
example:
> \FBF0 ==> has KD = \064A\0654\06C7 instead of =\064A\0654\0648\0619
> \FBDB ==> KD= \06c8 instead of =\0648\0670
> am I missing something?
Yes.
U+06C7 and U+06C8 have no decompositions.
06C7;ARABIC LETTER U;Lo;0;AL;;;;;N;ARABIC LETTER WAW WITH DAMMAH;;;;
^^
06C8;ARABIC LETTER YU;Lo;0;AL;;;;;N;ARABIC LETTER WAW WITH ALEF ABOVE;;;;
^^
You cannot infer formal decompositions for letters --
particularly for Arabic -- simply by looking at the
characters in the chart. To get the normative decomposition
status of any particular character (which determines
what its NFD or NFKD or NFC or NFKC normalizations will be),
you have to look at the decomposition field in
UnicodeData.txt (or check in NormalizationTest.txt)
--Ken
This archive was generated by hypermail 2.1.5 : Fri May 09 2008 - 16:48:17 CDT