Sorting Pali in Tibetan Script

From: Richard Wordingham <richard.wordingham_at_ntlworld.com>
Date: Sat, 7 Jul 2012 18:14:28 +0100

Can someone please advise me as to the sorting of Pali as Pali in
Tibetan script. I need a prompt response rather than a complete
treatment. It is possible that I have been misunderstood what I have
been able to pull together.

What I understand is the following:

(a) The retroflex lateral ('LLA' in most Unicode encodings) is written
<U+0F63 TIBETAN LETTER LA, U+0F39 TIBETAN MARK TSA -PHRU>, as at
http://www.tipitaka.org/tibt/ .

(b) For Pali, the retroflex lateral should be sorted as though a full
letter, rather than as letter plus subscript. This is general
international practice, embodied in scripts that have LLA encoded as an
independent letter, such as Sinhalese (backed up by SLS 1134:2004) and
Thai (many dictionaries).

(c) The long vowel II sorts at a primary level between the short vowel I
and the short vowel U - general practice in Indic scripts, and captured
by ISO 14651 and the Default Unicode Collation Element Table (DUCET).

Now if I am correct, this does have an interesting processing effect.
The syllable LLII, in NFD, will be written <U+0F63, U+0F71 TIBETAN
VOWEL SIGN AA, U+0F72 TIBETAN VOWEL SIGN I, U+0F39>, so to collate LLII
on the basis of the constituent consonant and vowel requires the
discontiguous contraction <U+0F63, U+0F39> and then the contraction
<U+0F71, U+0F72> from the skipped characters. Version 6.1.0 of the
Unicode Collation Algorithm requires the ability to do exactly this.
However, it has been proposed that Version 6.2.0 *prohibit* this
ability.

The treatment of LL.HA could be interesting, but is not of urgent
interest.

Doubts are cast on my analysis by the rules for Tibetan collation
given, for example at
http://developer.mimer.com/collations/tibetan/Chilton_slides.pdf ,
which states that U+0F71 is given a secondary weight and makes no
mention of the long vowels, and certainly makes no mention of any LLA.

If the desciption there is correct and complete, it seems that I should
see a sort order

LI <U+0F63, U+0F72> << LLI <U+0F63, U+0F72, U+0F39> <<
LII <U+0F63, U+0F71, U+0F72> << LLII <U+0F63, U+0F71, U+0F72, U+0F39>.

Is this the correct order for sorting as Tibetan? The diacritics do
seem to apply back-to-front.

Richard.
Received on Sat Jul 07 2012 - 12:22:54 CDT

This archive was generated by hypermail 2.2.0 : Sat Jul 07 2012 - 12:23:07 CDT