Description of CGJ

From: Richard Wordingham (richard.wordingham@ntlworld.com)
Date: Wed Jun 14 2006 - 17:50:24 CDT

  • Next message: Michael Everson: "Re: Glyphs for German quotation marks"

    The proposed summary of CGJ in the Unicode 5.0 glyph charts still says,
    'indicates that adjoining characters are to be treated as a graphemic unit'.
    This has been completely wrong since TUS 4.1. What it now means is
    something like, 'indicates that adjoining characters do not interact
    non-graphically as one would otherwise expect'. The three examples I can
    think of are:

    (i) Do not swap places under normalisation (e.g. Hebrew metheg hiriq v.
    hiriq metheg)
    (ii) Following U+0308 forms diaeresis, not umlaut, in Fraktur.
    (iii) Do not form a normal 'contraction' for collation (e.g. CH in Slovak or
    NG in Welsh).

    In particular the graphemic unit, if any, is not tight enough for enclosing
    diacritics to treat it as a unit. <X, CGJ, Y, U+20DD COMBINING ENCLOSING
    CIRCLE> is (in general) X followed by circled Y, not encircled XY. On the
    other hand, a Fraktur vowel with diaeresis remains a default grapheme
    cluster.

    Richard.



    This archive was generated by hypermail 2.1.5 : Wed Jun 14 2006 - 18:59:26 CDT