From: Kenneth Whistler (kenw@sybase.com)
Date: Mon Oct 02 2006 - 17:41:18 CST
Jarkko Ahonen asked (last week):
> Is Unicode going to have separate Unicode values for the Farsi (Persian)
> and Urdu digits as they now have same values but with glyph variation
> (digits 4, 6 and 7)?
The answer on this has been documented for some time in the
standard. See:
http://www.unicode.org/versions/Unicode4.0.0/ch08.pdf
and look at Table 8-2, Glyph Variation in Eastern Arabic-Indic Digits.
The variation in form for the digits 4, 6, and 7 between Persian,
Sindhi, and Urdu is considered *glyph* variation for the
range of Eastern Arabic-Indic digits. It is comparable, for
example, to the kind of range of glyphs found for ASCII digits
in different parts of the world.
In fact, the main reason for distinguishing the range of Arabic
digits U+0660..U+0669 from the range of Eastern Arabic-Indic
digits U+06F0..U+06F9 in the standard at all is not the variation
in glyph forms for 4, 5, 6, and 7, but rather the distinction
in bidirectional character properties: bc=AN versus bc=EN, relevant
to several rules in the Bidirectional Algorithm.
> How about the Marathi allographs of LA (U+0932) and SHA (U+0936)?
They are allographs, as documented -- hence treated as glyph
variants of those code points. There is no intention of creating
separate encoded characters for them.
--Ken
This archive was generated by hypermail 2.1.5 : Mon Oct 02 2006 - 17:44:46 CST