From: Cibu C J (cibucj@gmail.com)
Date: Fri Jan 05 2007 - 16:43:08 CST
Sorry to cross-post this in indic list also.
The relevant text is below:
-----------------------------
B. ZWJ in the following contexts:In a conjunct context. That is, a
sequence of the form:
* An Letter, followed by zero or more combining marks, followed by
a Virama, followed by a ZWJ, followed by zero or more combining marks,
followed by an Letter.
* As a regular expression:
/$L $M* $V ZWJ $M* $L/
where:
o $L = [:General_Category=Letter:]
o $M = [:General_Category=Mark:]
o $V = [:Canonical_Combining_Class=Virama:]
--------------------------------
This will not include the cases of Chillu letter being at the end of a
word. So B1 regular expression should be more inclusive and be:
/$L $M* $V ZWJ $M*/
BTW, I don't know about any combining markers in Malayalam. Does more
than zero $M make sense in case of Malayalam? I agree this is a
general regular expression and may be applicable in other scripts. I
was just wondering which are they.
Thanks
Cibu
This archive was generated by hypermail 2.1.5 : Thu Jan 18 2007 - 15:55:40 CST