From: Philippe Verdy (verdy_p@wanadoo.fr)
Date: Fri Nov 26 2004 - 13:13:46 CST
From: "Mark Davis" <mark.davis@jtcsv.com>
>I want to correct some misperceptions about CGJ; it should not be used for
> ligatures.
True. CGJ is a combining character that extends the grapheme cluster started
before it, but it does not imply any linking with the next grapheme cluster
starting at a base character.
So, even if one encodes, A+CGJ+E, there will still be two distinct grapheme
clusters A+CGJ and E, and the exact role of the trailing CGJ in the A+CGJ is
probably just a pollution, given that this CGJ has no influence on the
collation order, so that the sequence A+CGJ+E will collate like A+E, and it
does not influence the rendering as well.
A "correct" ligaturing would be A+ZWJ+E, with the effect of creating three
default grapheme clusters, that can be rendered as a single ligature, or as
separate A and E glyphs if the ZWJ is ignored.
For example, a ligaturing opportunity can be encoded explicitly in the
French word "efficace":
"ef"+ZWJ+"f"+ZWJ+"icace".
Note however that the ZWJ prohibits breaking, despite in French there's a
possible hyphenation at the first occurence, where it is also a syllable
break, but not for the second occurence that occurs in the middle of the
second syllable.
I don't know how one can encode an explicit ligaturing opportunity, while
also encoding the possibility of an hyphenation (where the sequence above
would be rendered as if the first ZWJ had been replaced by an hyphen
followed a newline.)
To encode the hyphenation opportunity, normally I would use the SHY format
control (soft hyphen):
"ef"+SHY+"fi"+SHY+"ca"+SHY+"ce"
If I want to encode explicit ligatures for the "ffi" cluster, if it is not
hyphenated, I need to add ZWJ:
"ef"+ZWJ+SHY+"f"+ZWJ+"i"+SHY+"ca"+SHY+"ce" (1)
The problem is whever ZWJ will have the expected role of enabling a ligature
if it is inserted between a letter and a SHY, instead of the two ligated
glyphs. In any case, the ligature should not be rendered if hyphenation does
occur, else the SHY should be ignored. So two rendering are to be generated
depending on the presence or absence of the conditional syllable break:
- syllable break occurs, render as: "ef-"+NL+"f"+ZWJ+"icace", i.e. with a
ligature only for the "fi" pair, but not for the "ff" pair and not even for
the generated "f"+hyphen...
- syllable break does not occur, render as "ef"+ZWJ+"f"+ZWJ+"icace", i.e.
with the 3-letter "ffi" ligature...
I am not sure if the string coded as (1) above has the expected behavior,
including for collation where it should still collate like the unmarked word
"efficace"...
This archive was generated by hypermail 2.1.5 : Fri Nov 26 2004 - 13:14:34 CST