From: Richard Wordingham (richard.wordingham@ntlworld.com)
Date: Wed Jun 14 2006 - 17:50:24 CDT
The proposed summary of CGJ in the Unicode 5.0 glyph charts still says,
'indicates that adjoining characters are to be treated as a graphemic unit'.
This has been completely wrong since TUS 4.1. What it now means is
something like, 'indicates that adjoining characters do not interact
non-graphically as one would otherwise expect'. The three examples I can
think of are:
(i) Do not swap places under normalisation (e.g. Hebrew metheg hiriq v.
hiriq metheg)
(ii) Following U+0308 forms diaeresis, not umlaut, in Fraktur.
(iii) Do not form a normal 'contraction' for collation (e.g. CH in Slovak or
NG in Welsh).
In particular the graphemic unit, if any, is not tight enough for enclosing
diacritics to treat it as a unit. <X, CGJ, Y, U+20DD COMBINING ENCLOSING
CIRCLE> is (in general) X followed by circled Y, not encircled XY. On the
other hand, a Fraktur vowel with diaeresis remains a default grapheme
cluster.
Richard.
This archive was generated by hypermail 2.1.5 : Wed Jun 14 2006 - 18:59:26 CDT