From: Antoine Leca (Antoine10646@leca-marti.org)
Date: Mon Oct 25 2004 - 11:42:12 CST
Sorry if you receive this twice: I posted it in the Indic list (appropriate
AFAIK) but copied the general list since experts not reading the first might
help. Please answer only on the Indic list to avoid more duplicates; thanks
in advance.
Following a recent thread, I am trying to understand the minutes of the June
meeting. I read there
[99-C37] Consensus: The UTC recommends that "right-side" forms
of conjuncts in Sinhala be represented by a sequence of <zwj,
virama, consonant>. [L2/04-131]
L2/04-131 itself is forbidden for me to get with
http://www.unicode.org/cgi-bin/GetMatchingDocs.pl?L2/04-131, but it exists
an equivalent copy publicly available at
http://std.dkuug.dk/jtc1/sc2/wg2/docs/n2737.pdf (I guess it is the same
because the latter says explicitely "L2/04-131" ;-)). This is a committee
draft, released for public comments in 2004-04-15, of revision 2 of SLS 1134
(the encoding of Sinhala, a Sri Lanka standard).
I am very interested to learn about the "zwj,vir,cons" sequence, and not
only because I spent a few hours end of July to analyse this very sequence
(in response to http://www.unicode.org/review/pr-37.pdf), while it appears
from the minutes that a few week before a decision was taken in the
committee to bring this very sequence into general use, but for yet another
use...
What is really "interesting" (so I think) is that this sequence
(zwj,vir,cons, really 200D 0DCA) does not appear in the said document;
neither is the expression "right-side"... So what is happening here?
A bit of context is probably needed here, so I address anyone to re-read
Michael's http://www.evertype.com/standards/si/iso10646-to-sls1134.html
(thanks Michael!), written in 1997 (so anything there should be taken with a
pinch of salt, particularly the use of the joiners) which described, around
the end, the problems with conjuncts in Sinhala (script).
If I read correctly:
-- the usual case, i.e. in Sinhala language (Elu), is to use explicit
virama (al-lakuna, 0DCA); it is BUD-DHO in Michael's example; it does not
need any joiner (<0DB6, 0DD4, 0DAF, 0DCA, 0DB0, 0DDC>);
-- when a ligature conjunct, Brahmi's style, is requested, ZWJ/200D is put
_after_ the virama; this also happens for rakaransaya (subjoined ra),
yansaja (post-base ya) and repaya (similar to Nagari's repha), common in
Sinhala; to stay with Michael's exmaple, this one is BU-DDHO, and would be
encoded <0DB6, 0DD4, 0DAF, 0DCA, 200D, 0DB0, 0DDC>.
Till there, I believe it is exactly what spells L2/04-131 / N2737
(particularly §§ 5.6 to 5.8).
If we study Michael's document, we can understand that the so-called Pali
"kerned" conjuncts are not adressed, BU-[DDH]O.
So my educated guess (helped by documents recently made available in Sri
Lanka) is that the cons/200D/0DCA/cons sequence is used to encode these
"kerned" conjuncts or "touching letters". As a result it ought to be encoded
<0DB6, 0DD4, 0DAF, 200D, 0DCA, 0DB0, 0DDC>.
Can someone confirm this?
Also, can someone confirm that what is described here is actually what will
put in SLS 1134 rev. 2? (or the best approximation of)
Antoine
This archive was generated by hypermail 2.1.5 : Mon Oct 25 2004 - 11:56:09 CST