Re: Pashto alphabets Unicodes

From: Doug Ewell <doug_at_ewellic.org>
Date: Wed, 01 Feb 2012 11:20:00 -0700

This discussion belongs on the Unicode list, not cldr-users.

Said Zazai <saidsemail at gmail dot com> wrote:

> I understand what you are saying but then looking at these charts, you
> do see all 4 or 2 codepoints for alphabets of other languages in the
> region. It does make life easy for developers if these 4/2 shapes of
> each alphabet had Unicode codepoints.

It doesn't make life easier for developers, because developers should
not be using these compatibility characters to encode Arabic-script
text, in Pashto or any other language.

> If Urdu's \u0679 can get into Arabic Presentation Form A then why
> couldn't \u067C.

This "fairness" argument might apply to encoding normal characters, but
not compatibility presentation forms. The ones that are there exist
because they were required by legacy processing in 1991 or before. It's
not an inherently open-ended set.

> I don't see the justification based on your statement. I don't want to
> sound like a mad man but if these glyphs/shapes have their codepoints
> then we can make a lot of applications which are still widely used,
> backward compatible for Pashto language.

It's possible to build a conversion algorithm between a legacy Pashto
encoding, with separate code points for contextual forms, and Unicode
with its contextual shaping model, without adding new compatibility
characters as a pivot. Real Unicode applications should use the "real"
Arabic characters.

--
Doug Ewell | Thornton, Colorado, USA
http://www.ewellic.org | @DougEwell ­
Received on Wed Feb 01 2012 - 12:26:09 CST

This archive was generated by hypermail 2.2.0 : Wed Feb 01 2012 - 12:26:12 CST