Persian Computing with Unicode

Behdad Esfahbod - FarsiWeb Project

Intended Audience: Managers, Software Engineers, Content Developers, font Designers, Technical Writers, Testers
Session Level: Beginner, Intermediate

The Persian language has predominantly been considered a variant of the Arabic language in efforts to internationalize software. This misconception is largely due to the fact that the two languages share a common script. While the script is an important similarity, one must consider that the semantics and habits behind the two languages are completely different. To successfully incorporate Persian support, a certain degree of knowledge relating to the semantics of the language is necessary. Using real life examples, this tutorial session focuses on providing the typical non-Persian developers and software engineers an in-depth understanding of the characteristics of Persian computing.

After briefly explaining the way Persian is handled in the Unicode standard, the audience will be presented with new concepts that are critical for proper Persian support in internationalized software. These concepts do not fall into the domain of the Unicode standard.

Topics discussed include, but not limited to, which Unicode characters to use in Persian text and for what, the basic pipeline of rendering Persian text, which ligatures to use and which not, and how Persian fonts should look. In addition, we will take an in-depth look at special handling needed in Persian text processing, examples which include spell-checking, loose-searching, and different sorting schemes used simultaneously in Iran.

The inherent complexity of the language has delayed Persian computing in software internationalization efforts. However, we have seen many issues addressed and improved in the recent years in regards to Persian computing. In keeping up with these improvements, this session is intended to give managers, software engineers, and developers enough information and references to add proper Persian support to their products.