Re: Arabic letters and Extended Arabic letters

From: Kenneth Whistler (
Date: Wed Sep 17 1997 - 19:20:00 EDT

> ! < Bismellah ar-Rahman ar-Raheem >
> Hi;
> I don't know the criteria followed to order the letters in Unicode. For
> example why are the extended Arabic letters not imbedded within the
> Arabic letters. At least Urdu and Persian languages are still living
> languages using these letters.

And many other living languages as well, a few of which are listed on
page 6-22. Arabic is, indeed, one of the major cosmopolitan world scripts.

The explanation for the order chosen is also to be found on page 6-22.
The order for the "basic" Arabic alphabet followed ISO/IEC 8859-6, which
was in turn derived from the European standard ECMA-114, and that, in
turn was derived from the Arabic standard ASMO 449. Several national
bodies required that this order be retained when ISO 10646 was first
approved, and so all Arabic letters used in the extended Arabic script
(for Urdu, Farsi, Sindhi, Pashto, etc.) were added following the basic
script, but in the same Arabic script block.

> Another question is why is the shaping included (it's even far from the
> isolated shape)? Shouldn't that be in the font page or something? I mean
> if a (Arabic letter ALEF) is followed by an (Arabic letter BAA) I expect
> one shape for the ALEF and one shape for the BAA. So it could be
> deducted by the operating system or whatever responsible for rendering.
> The same could be said about the digits; why wasn't this left for font
> pages?

The encoding of positional shape variants was only done for compatibility
with existing encodings which used distinct characters for each different
positional variant. The Arabic script introduction in the Unicode
Standard clearly spells out the preferred way of encoding Unicode
text using the Arabic script characters.

Digits are another matter. True Arabic-Indic digits are not simply font
substitutions for Latin Arabic (i.e. 0..9) digits. Among other things,
they have different directional properties that interact with the
bidirectional algorithm. The Eastern Arabic-Indic digits are also
not just Urdu font variants; they behave differently
for bidirectional formatting. See page 3-17 of the standard.

--Ken Whistler

> If these are trivial questions tell me where to find answers.
> Thanks and Salam
> **********************************************************************
> * Hazem Mahsoub Soliman *
> * SAQQARA Systems Inc. *
> * 1230 Oakmead Parkway, Suite 218 Tel: (408) 738-3962 *
> * Sunnyvale, CA 94086 Fax: (408) 738-8345 *
> **********************************************************************

This archive was generated by hypermail 2.1.2 : Tue Jul 10 2001 - 17:20:36 EDT