24th Internationalization and Unicode Conference

Guidelines for Use of Arabic Characters

Kamal Mansour - Agfa Monotype Corporation

Intended Audience:	Software Engineers, Systems Analysts, Font Designers, Site Coordinators, Technical Writers, Web Administrators, Designers
Session Level:	Intermediate, Advanced

The Unicode Standard (TUS) encodes the basic characters of the Arabic alphabet, as well as a multitude of compatibility characters and presentation forms. The repertoire of Arabic characters in TUS is sufficient for the representation of the three major languages using Arabic script (Arabic, Persian, Urdu), in addition to many other languages such as Sindhi, Kurdish, Jawi, Baluchi, and Pashto, among others. Because of the large number of compatibility characters, users may have difficulty in choosing the best characters for the most compact representation of a particular language. Just as Roman characters are now used by many languages for which they were not originally intended, Arabic characters are used by large number of languages belonging to a variety of language families. Whenever the repertoire of basic Arabic characters proved insufficient for a particular language, it was extended by creating new variants of characters. Sometimes this extension was accomplished through the use of different diacritic marks, while at other times a glyph variant in an Arabic- language context was taken to represent a unique character for a different language.

When does a particular shape represent a different character, and when is it just an alternative shape? Which variations are based on locale, language, or just style? What about the use of numerals in different languages? What compromises are necessary in order to accommodate existing national standards? What are some typographic conventions that have changed over time? How can search engines (pattern matching) cope with the multiplicity of alternative characters? We will examine these, among other common questions, pertaining to choice of characters in Arabic, Persian, and Urdu.

When the world wants to talk, it speaks Unicode

International Unicode Conferences are organized by Global Meeting Services, Inc., (GMS). GMS is pleased to be able to offer the International Unicode Conferences under an exclusive license granted by the Unicode Consortium. All responsibility for conference finances and operations is borne by GMS. The independent conference board serves solely at the pleasure of GMS and is composed of volunteers active in Unicode and in international software development. All inquiries regarding International Unicode Conferences should be addressed to info@global-conference.com.

Unicode and the Unicode logo are registered trademarks of Unicode, Inc. Used with permission.

30 May 2003, Webmaster