L2/03-026
Re: | Scope of enclosing marks |
From: | Mark Davis |
Date: | 2002-01-29 |
As the editorial committee was going through the text to implement the latest UTC decisions, we came to a place where there is not enough direction to make an (editorial) decision, so we need the UTC to decide.
Unicode 3.2 has the following text (I added paragraph numbers):1. Enclosing Combining Marks. These marks enclose the entire preceding grapheme cluster. For example, in the following sequence the entire Hangul syllable is circled, not just part of it:
- U+1100 HANGUL CHOSEONG KIYEOK
- U+1161 HANGUL JUNGSEONG A
- U+20DD COMBINING ENCLOSING CIRCLE
2. This is also true of grapheme clusters composed of elements linked by a Grapheme_Link or combining grapheme joiner. For example, the entire conjunct is circled in the following sequence:
- U+0915 DEVANAGARI LETTER KA
- U+094D DEVANAGARI SIGN VIRAMA
- U+0922 DEVANAGARI LETTER DDHA
- U+20DD COMBINING ENCLOSING CIRCLE
3. On the other hand, where elements are linked by a Grapheme_Link or combining grapheme joiner, non-enclosing combining marks only apply to the last base character. For example, in the following sequence the nukta applies to the immediately preceding ddha, not to the entire cluster:
- U+0915 DEVANAGARI LETTER KA
- U+094D DEVANAGARI SIGN VIRAMA
- U+0922 DEVANAGARI LETTER DDHA
- U+093C DEVANAGARI SIGN NUKTA
But in the meantime, the UTC decided to narrow the scope of grapheme clusters to a clear core, basically:
(<hangul syllable> | <base> ) <non-spacing mark>*
[and the name is changed to "default grapheme cluster"]
That means that paragraph #2 and #3 above don't really work anymore. The UTC has to decide how to fix it. We broke it into two parts, because conceivably the answer might be different for a virama than it is for the grapheme joiner.