arabic: taa marbuta

From: Reynolds, Gregg (greynolds@datalogics.com)
Date: Mon Jul 19 1999 - 13:27:45 EDT


You might want to clarify the semantics of taa marbuta. The name denotes a
grammatical category that might be called "binding tee" in English. The
unicode codepoints that employ the name "teh marbuta" or some variant use
the dotted heh as the illustrative glyph. However, the taa marbuta in
medial position uses the medial teh (taa) letterform. This reflects its
dual role: in some contexts it is written with dotted heh and pronounced
either like heh or like taa; in other contexts it is written and pronounced
like taa.

Obviously, this is a case where the name used for unicode codepoints may be
misleading.

The semantics of taa marbuta is important in Arabic for searching and
sorting. A search for a word ending in taa marbuta should match those words
in which it appears as a taa; such words should also sort with the forms
using the dotted heh form. For example, using hash "#" to indicate the taa
marbuta,

        risAla# arabiyya# is pronounced risaala arabiyya
        risAlatuhum is pronounced as written, but the t is a tah marbuta

Using unicode as it stands now, the encoding of these forms would presumably
not use U+0629, so searching and sorting would require special logic to get
it right; standard software wouldn't suffice. On the other hand, if
"risAlatuhum" were encoded "risAla#uhum", only display logic would be
affected, and in this case the Right Thing To Do is simple and unambiguous.

If U+0629 must retain the name ARABIC LETTER TEH MARBUTA, then a property
should be defined indicating that the medial form of U+062A ARABIC LETTER
TEH is also used to denote taa marbuta.

Otherwise, U+0629 should be called ARABIC LETTER DOTTED HEH or the like.

While we're at it one could argue that the presentational forms FE93 and
FE94, indicating isolated and final forms of taa marbuta respectively,
should be augmented by a codepoint indicating the medial form, which is
identical to FE97, TEH MEDIAL.

Sincerely.

Gregg Reynolds



This archive was generated by hypermail 2.1.2 : Tue Jul 10 2001 - 17:20:48 EDT