Re: Formal alias for U+034F COMBINING GRAPHEME JOINER (CGJ)?

From: Asmus Freytag (asmusf@ix.netcom.com)
Date: Wed Mar 12 2008 - 09:52:26 CST

  • Next message: Philippe Verdy: "RE: Formal alias for U+034F COMBINING GRAPHEME JOINER (CGJ)?"

    Karl,

    the name for 034F is indeed somewhat unfortunate, but it's not a
    "mistake" in the usual sense.

    Normally, adding a formal alias would therefore not be a proper remedy,
    but in this case, your proposal has some hidden virtues. If one were to
    add your proposed alias, then the character would be named *both*
    separator and joiner, indicating perhaps that its true name should have
    been "COMBINING CHARACTER THAT DOES SOMETHING SPECIAL"

    Another nice alias might have been "COMBINING CHARACTER WITH UNUSUAL
    PROPERTIES".

    Jokes, aside, there *is* a strong element of a "separator" functionality
    inherent in the CGJ. For example, the fact that it has canonical
    combining class 0 makes it a separator between other combining marks in
    the same combining sequence (each side of the CGJ gets ordered
    separately in canonical reordering).

    That effect of the CGJ is the most definite, most normative effect it
    has, because it derives from the formal properties of the character. The
    effects that it *might* have in sorting, on the other hand, are all a
    matter of convention: you need to tailor your sort tables so that they
    recognize the CGJ by giving it a special sort weight, or by giving
    sequences without the CGJ a special sorting behavior. Cases like the
    "ch" for Slovak example that you cite are relatively "automatic" because
    usually, the mere presence of a character not specifically accounted for
    in the sorting tables would interrupt the treatment of "ch" as a
    contraction. If that character is invisible, you get the correct effect.
    (Danes have long SHY to separate "aa", but that's because a syllable
    boundary is usually present there anyway).

    As a mental shorthand, to remind myself of the properties of the CGJ, I
    think of it as "INVISIBLE ENCLOSING MARK". (Although it's gc=Mn, for
    whatever reason, so it should have been "invisible nonspacing mark with
    ccc=0")

    Whether any of these suggestions would make a good alias (or even formal
    alias) I'll let others decide.

    A./

    On 3/11/2008 9:59 PM, Karl Pentzlin wrote:
    > Following the description in p.542 of TUS 5.0, the CGJ
    > (i.e. U+034F COMBINING GRAPHEME JOINER) separates graphemes,
    > e.g. in Slovak, it prevents a "ch" to be interpreted as a grapheme.
    > Thus, the CGJ splits or separates, but does not "join" in any case.
    >
    > In the code table, the character has a informative note
    > "The name of this character is misleading, it does not actually join
    > graphemes", without giving more information.
    >
    > Is it appropriate to propose a formal alias like
    > "COMBINING GRAPHEME SEPARATOR"?
    >
    > - Karl Pentzlin
    >
    >
    >
    >
    >



    This archive was generated by hypermail 2.1.5 : Wed Mar 12 2008 - 09:54:52 CST