From: Philippe Verdy (verdy_p@wanadoo.fr)
Date: Sat Jul 31 2010 - 09:48:54 CDT
In order to avoid confusions with the modes named "Blanked" or
"Separating", May be we could adopt a clearer general syntax for them:
- "Blanked" -> "[]"
- "Separating" -> "[.0201*]"
- "Shifted" -> "[.0000.0000.0000*]"
This syntax explicitly states the collation weights that are inserted
in variable elements, and the "*" is a place holder stating where the
default weights from the DUCET are inserted in variable elements. Its
absence means that non insertion occurs, but instead all the remaining
weights are filled with [.0000], as needed in the current collation
level.
This effectively translates what happens to the weights and how new
weights are inserted in collation elements (at the begining for
variable elements, or at end with weight FFFF for non variable
elements)
Nothing is needed for other collation elements, but if needed we could
specify that they use a specific weight, for tailoring purpose, in a
syntax like:
Variable:[.0201*]; [\p{Bidi:R}]:[.FFFF*]
Which would mean that Variable elements are shifted one level up by
inserting primary weight [.0201] followed by all weights from the
DUCET, and that collation graphemes starting by a strong RTL character
are shifted one level up by placing them with primary weight [.FFFF]
All the other ignorable characters being filled by implicitly
appending [.0000], and all the other non-ignorable and non-variable
characters being filled by appending weight [.FFFF]. In all cases, the
last implicit level (displayed in the DUCET) should still be the code
point scalar value (or 00000 if the collation element is not the first
one in an expansion, or if it was inserted from a contraction by
inserting a ignorable filler to hold the other codepoint scalar values
not representable in the first collation element).
Philippe.
This archive was generated by hypermail 2.1.5 : Sat Jul 31 2010 - 09:51:26 CDT