-----BEGIN PGP SIGNED MESSAGE-----
Kenneth Whistler wrote:
> Kent Karlsson's suggestion:
>
> > I vaguely suggested adding
> > an enclosing (in some sense) invisible combining character to
> > solve this: <o, CGJ, o, invisible-enclosing, combining breve>.
> > No character has been designated for such use, though. And I
> > haven't made a formal proposal yet.
>
> (i.e. create a generic way to make a non-enclosing combining mark
> apply to a grapheme cluster, by encoding an invisible enclosing
> combining mark)
For this approach to work, <invisible-enclosing> must have combining
class 0, and be in Grapheme_Extend and general category Mn. Because it
involves a new character, it can't be included in the standard until
Unicode 3.3, and since that character will not be in any of the
Grapheme_* classes, existing implementations will then treat the
sequence as *three* grapheme clusters.
An alternative is to use CGJ itself for <invisible-enclosing>, i.e.
<o, CGJ, o, CGJ, combining breve>. This works because:
- CGJ has combining class 0, so it prevents the breve from composing
with the second o.
- CGJ has general category Mn and is invisible, as required.
- it is straightforward to modify the grapheme breaking rules to
treat this as a single cluster, by adding the rule "Link × Extend".
(This assumes the corrections to the other rules that I described
in my comments.)
I also considered <o, CGJ, combining breve, o> (which encodes the
breve in the same position that a double diacritic would be). That
has the disadvantage that it requires the more complicated rule
"Link × *Extend (Precede / Base)", though. If only one combining
mark is allowed to apply to a cluster using CGJ, then
"Link × (Precede / Extend) Base" would probably suffice, but I
still prefer adding "Link × Extend" to the rules that I suggested
in part 1 of my comments, since they are defined only in terms of
character pairs without any lookahead.
Here is what I'm suggesting written out in full:
When a sequence of combining diacritical marks immediately follows
CGJ, apply them to the whole preceding grapheme cluster.
Use the following breaking rules, with Precede = Join_Control:
CR × LF
Base × Extend }
Extend × Extend } equivalent to
Base × Link } (Base / Extend) × (Extend / Link)
Extend × Link }
Precede × Precede }
Precede × Base } equivalent to
Link × Precede } (Link / Precede) × (Precede / Base)
Link × Base }
Link × Extend
L × (L / V / LV / LVT)
(LV / V) × (V / T)
(LVT / T) × T
Any ÷
[Since it is harmless to have "Precede × Extend", another possibility
would be to change the third block of rules to:
(Link / Precede) × (Precede / Base / Extend)
In other words, a break can only occur after Link or Join_Control if
they are followed by a control character. It would not be a good idea to
further simplify this to "(Link / Precede) × Any", since we always want
there to be a break at the end of a line, for example.]
- --
David Hopwood <david.hopwood@zetnet.co.uk>
Home page & PGP public key: http://www.users.zetnet.co.uk/hopwood/
RSA 2048-bit; fingerprint 71 8E A6 23 0E D3 4C E5 0F 69 8C D4 FA 66 15 01
Nothing in this message is intended to be legally binding. If I revoke a
public key but refuse to specify why, it is because the private key has been
seized under the Regulation of Investigatory Powers Act; see www.fipr.org/rip
-----BEGIN PGP SIGNATURE-----
Version: 2.6.3i
Charset: noconv
iQEVAwUBPIRlEzkCAxeYt5gVAQGAKQf/af5ePbLyscgW4sPhPaDdZYtAwygjO6n9
BaMFPED/i/GLiFzXNDMVJV7+PcDMOxKEq6sSHb66j5dpjpOt/PBZsrwd/ywGJuVs
0ehX54NsGYG7A9TiIRJcBGpXWapKjbupyjD0O+DdwWWmzpWmygEXDbOemjU8g6L9
Su0cl/grd2bFCokVKmHrQWoTY+GYUpByDZ388uWmX7ydaLWd4j4fvct/cBXa8Kls
Uwv8bsj7iz8TC/vAKy3r55Xll3ZPL2vLm+v82nIugCIuYxfJRRfHqXPSXMDoKOs2
GodsjLhHamDUpeGs9pTtojRTEFdGfkhMNs+fpecN3b0yNfHGFa5HEw==
=/7Ml
-----END PGP SIGNATURE-----
This archive was generated by hypermail 2.1.2 : Tue Mar 05 2002 - 02:26:41 EST