UAX #44, Table 13 ("Bidi_Class Values") includes the following
descriptions:
R - Right_To_Left - any strong right-to-left (non-Arabic-type) character
AL - Arabic_Letter - any strong right-to-left (Arabic-type) character
But I can't find any definition, here or elsewhere, of what constitutes
an "Arabic-type" or a "non-Arabic-type" letter.
Looking in UnicodeData.txt, I see that Arabic, Syriac, and Thaana
letters are assigned a value of 'AL', while other RTL letters, including
Hebrew, N'Ko, Samaritan, Mandaic, and some archaic scripts in plane 1
are 'R'. Clearly, shaping behavior and ligation isn't what makes a
letter or script "Arabic-type" or "non-Arabic-type."
How would I make this distinction for an arbitrary letter or script,
other than by association with an existing letter or script already
designated as one or the other? (Yes, I admit, I am thinking about RTL
scripts in CSUR.)
-- Doug Ewell | Thornton, Colorado, USA | RFC 5645, 4645, UTN #14 www.ewellic.org | www.facebook.com/doug.ewell | @DougEwell Received on Wed Aug 24 2011 - 10:39:27 CDT
This archive was generated by hypermail 2.2.0 : Wed Aug 24 2011 - 10:39:29 CDT