Re: CGJ , RLM

From: Kenneth Whistler (kenw@sybase.com)
Date: Mon Nov 29 2004 - 16:04:36 CST

  • Next message: Richard Cook: "Re: Ideograph?!?"

    Mark Davis said (in reference to a long set of comments by
    Philippe Verdy on this thread):

    > The statements below are incorrect

    And Philippe asked:

    > Which "statements"? My message is mostly a read as a question, not as an
    > affirmation...

    And I will attempt the fact-finding...

    > CGJ is a combining character that extends the grapheme cluster started
    > before it,

    True but misleading. CGJ is a combining character, and like *all*
    other nonspacing combining characters it has the property
    Grapheme_Extend=True. CGJ's *function* is not to extend the grapheme
    cluster before it; that just happens automatically, as for any
    character with gc=Mn.

    And that was a statement.

    > but it does not imply any linking with the next grapheme cluster
    > starting at a base character.

    True. Another statement.

    > So, even if one encodes, A+CGJ+E, there will still be two distinct grapheme
    > clusters A+CGJ and E, and the exact role of the trailing CGJ in the A+CGJ is
    > probably just a pollution, given that this CGJ has no influence on the
    > collation order, so that the sequence A+CGJ+E will collate like A+E,

    Misconstrued. Whether CGJ influences the collation order or not
    depends on how it is weighted in a tailored collation table. And
    the main *point* of having a CGJ is to provide a target for tailored
    collation, so that it *can* make a difference. Statements, by the way.

    > and it
    > does not influence the rendering as well.

    True. Another statement.

    > A "correct" ligaturing would be A+ZWJ+E,

    A matter of opinion, neither obviously true nor false. And a statement.

    > with the effect of creating three
    > default grapheme clusters,

    False. The correct value is 2.

    > that can be rendered as a single ligature, or as
    > separate A and E glyphs if the ZWJ is ignored.

    True. And a statement.

    > For example, a ligaturing opportunity can be encoded explicitly in the
    > French word "efficace":
    > "ef"+ZWJ+"f"+ZWJ+"icace".

    True (although superfluous). And a statement.

    > Note however that the ZWJ prohibits breaking,

    False. ZWJ is lb=CM, which prevents a break *before*, but not
    a break *after*.

    > despite in French there's a
    > possible hyphenation at the first occurence, where it is also a syllable
    > break, but not for the second occurence that occurs in the middle of the
    > second syllable.

    True (I assume) statements about French.

    > I don't know how one can encode an explicit ligaturing opportunity, while
    > also encoding the possibility of an hyphenation (where the sequence above
    > would be rendered as if the first ZWJ had been replaced by an hyphen
    > followed a newline.)

    True (I assume) statements about Philippe's state of knowledge.

    > To encode the hyphenation opportunity, normally I would use the SHY format
    > control (soft hyphen):
    > "ef"+SHY+"fi"+SHY+"ca"+SHY+"ce"

    True (I assume) statements about Philippe's practice in text representation.

    >
    > If I want to encode explicit ligatures for the "ffi" cluster, if it is not
    > hyphenated, I need to add ZWJ:

    False (at least existentially, although I cannot comment on
    your personal wants and needs). And a statement.

    > "ef"+ZWJ+SHY+"f"+ZWJ+"i"+SHY+"ca"+SHY+"ce" (1)

    And as Doug pointed out, this is an incredibly baroque (and obtuse)
    way of attempting to represent the word "efficace" in plain text.

    >
    > The problem is whever ZWJ will have the expected role of enabling a ligature
    > if it is inserted between a letter and a SHY, instead of the two ligated
    > glyphs. In any case, the ligature should not be rendered if hyphenation does
    > occur, else the SHY should be ignored. So two rendering are to be generated
    > depending on the presence or absence of the conditional syllable break:
    > - syllable break occurs, render as: "ef-"+NL+"f"+ZWJ+"icace", i.e. with a
    > ligature only for the "fi" pair, but not for the "ff" pair and not even for
    > the generated "f"+hyphen...
    > - syllable break does not occur, render as "ef"+ZWJ+"f"+ZWJ+"icace", i.e.
    > with the 3-letter "ffi" ligature...

    A whole series of statements. Together somewhat of a muddle for the
    simple observation that "ffi" is not rendered with a single ligature
    if there is a line break in the middle of it.

    >
    > I am not sure if the string coded as (1) above has the expected behavior,
    > including for collation where it should still collate like the unmarked word
    > "efficace"...

    True (I assume) statement about Philippe's state of knowledge.

    Reading to the end, I find *only* statements here, and no question
    actually posed.

    In the future, if you want a message to be taken *as* a question,
    it would be best to 1. Make it short, and 2. Actually pose a
    question in it, preferably terminating the sentence to be so
    interpreted with a "?"

    --Ken



    This archive was generated by hypermail 2.1.5 : Mon Nov 29 2004 - 16:09:31 CST