Re: How to make "oo" with combining breve/macron over pair?

From: David Hopwood (david.hopwood@zetnet.co.uk)
Date: Tue Mar 05 2002 - 01:27:45 EST


-----BEGIN PGP SIGNED MESSAGE-----

Kenneth Whistler wrote:
> Kent Karlsson's suggestion:
>
> > I vaguely suggested adding
> > an enclosing (in some sense) invisible combining character to
> > solve this: <o, CGJ, o, invisible-enclosing, combining breve>.
> > No character has been designated for such use, though. And I
> > haven't made a formal proposal yet.
>
> (i.e. create a generic way to make a non-enclosing combining mark
> apply to a grapheme cluster, by encoding an invisible enclosing
> combining mark)

For this approach to work, <invisible-enclosing> must have combining
class 0, and be in Grapheme_Extend and general category Mn. Because it
involves a new character, it can't be included in the standard until
Unicode 3.3, and since that character will not be in any of the
Grapheme_* classes, existing implementations will then treat the
sequence as *three* grapheme clusters.

An alternative is to use CGJ itself for <invisible-enclosing>, i.e.
<o, CGJ, o, CGJ, combining breve>. This works because:

 - CGJ has combining class 0, so it prevents the breve from composing
   with the second o.
 - CGJ has general category Mn and is invisible, as required.
 - it is straightforward to modify the grapheme breaking rules to
   treat this as a single cluster, by adding the rule "Link × Extend".
   (This assumes the corrections to the other rules that I described
   in my comments.)

I also considered <o, CGJ, combining breve, o> (which encodes the
breve in the same position that a double diacritic would be). That
has the disadvantage that it requires the more complicated rule
"Link × *Extend (Precede / Base)", though. If only one combining
mark is allowed to apply to a cluster using CGJ, then
"Link × (Precede / Extend) Base" would probably suffice, but I
still prefer adding "Link × Extend" to the rules that I suggested
in part 1 of my comments, since they are defined only in terms of
character pairs without any lookahead.

Here is what I'm suggesting written out in full:

  When a sequence of combining diacritical marks immediately follows
  CGJ, apply them to the whole preceding grapheme cluster.

  Use the following breaking rules, with Precede = Join_Control:

                CR × LF

              Base × Extend }
            Extend × Extend } equivalent to
              Base × Link } (Base / Extend) × (Extend / Link)
            Extend × Link }

           Precede × Precede }
           Precede × Base } equivalent to
              Link × Precede } (Link / Precede) × (Precede / Base)
              Link × Base }
              Link × Extend

                 L × (L / V / LV / LVT)
          (LV / V) × (V / T)
         (LVT / T) × T

               Any ÷

[Since it is harmless to have "Precede × Extend", another possibility
would be to change the third block of rules to:

  (Link / Precede) × (Precede / Base / Extend)

In other words, a break can only occur after Link or Join_Control if
they are followed by a control character. It would not be a good idea to
further simplify this to "(Link / Precede) × Any", since we always want
there to be a break at the end of a line, for example.]

- --
David Hopwood <david.hopwood@zetnet.co.uk>

Home page & PGP public key: http://www.users.zetnet.co.uk/hopwood/
RSA 2048-bit; fingerprint 71 8E A6 23 0E D3 4C E5 0F 69 8C D4 FA 66 15 01
Nothing in this message is intended to be legally binding. If I revoke a
public key but refuse to specify why, it is because the private key has been
seized under the Regulation of Investigatory Powers Act; see www.fipr.org/rip

-----BEGIN PGP SIGNATURE-----
Version: 2.6.3i
Charset: noconv

iQEVAwUBPIRlEzkCAxeYt5gVAQGAKQf/af5ePbLyscgW4sPhPaDdZYtAwygjO6n9
BaMFPED/i/GLiFzXNDMVJV7+PcDMOxKEq6sSHb66j5dpjpOt/PBZsrwd/ywGJuVs
0ehX54NsGYG7A9TiIRJcBGpXWapKjbupyjD0O+DdwWWmzpWmygEXDbOemjU8g6L9
Su0cl/grd2bFCokVKmHrQWoTY+GYUpByDZ388uWmX7ydaLWd4j4fvct/cBXa8Kls
Uwv8bsj7iz8TC/vAKy3r55Xll3ZPL2vLm+v82nIugCIuYxfJRRfHqXPSXMDoKOs2
GodsjLhHamDUpeGs9pTtojRTEFdGfkhMNs+fpecN3b0yNfHGFa5HEw==
=/7Ml
-----END PGP SIGNATURE-----



This archive was generated by hypermail 2.1.2 : Tue Mar 05 2002 - 02:26:41 EST