From: Peter Kirk (peterkirk@qaya.org)
Date: Tue Sep 14 2004 - 17:41:24 CDT
On 14/09/2004 22:44, Andy Heninger wrote:
> Peter Kirk wrote:
> > I have in mind certain situations found in Hebrew (Ketiv/Qere blended
> > forms) in which anomalous (but quite frequently found) word forms
> > begins
> > with a spacing combining character. The currently specified way of
> > supporting this situation is to use SPACE or NBSP followed by the
> > combining character (as these combining characters do not have
> > non-spacing clones). It would be highly undesirable to make a change
> > here which would allow word breaks, line breaks etc after the
> > combining
> > character but before the rest of the word.
>
> The proposed change to word boundaries would have no effect on the
> case you describe, but word boundaries may already not be doing what
> you want. If you have a SPACE or NBSP preceding the combining
> character, the grapheme cluster composed of the space plus the
> combining char will behave as just a space, and be split off from the
> remainder of the word.
>
> I found 16 Hebrew characters that would be affected by the change,
> \u05B0 HEBREW POINT SHEVA through
> \u05C2 HEBREW POINT SIN DOT
> with a couple of holes in the middle of the range.
>
> To have these characters attach to a following word, an alphabetic
> base character is needed.
>
These are the Hebrew characters I had in mind. But then wouldn't the
Hebrew accents 0591-05AF also be affected in the same way? If these
don't have Grapheme_Extend = true, why not?
Well, all of this rather surprises me, because we have been through this
one on this list before and others have assured me that there is a
special rule by which spaces with combining marks are treated specially.
But I see, that is in TR14 under line breaking, not in TR29 under word
breaking: "If U+0020 SPACE is used as a base character, it is treated
as ID instead of SP." Well, it is perhaps more critical that there
should be no line break in these situations than that there should be no
word break. I must say I am confused as to why line breaking and word
breaking are considered such different issues that they are dealt with
entirely separately, when at least in the scripts I am familiar with the
rules should be almost identical.
But this fact that SPACE or even NBSP with a combining character is
treated as not part of a word for word boundary calculation is another
strong argument that INVISIBLE LETTER is necessary, cf. Public Review
Issue #41.
-- Peter Kirk peter@qaya.org (personal) peterkirk@qaya.org (work) http://www.qaya.org/
This archive was generated by hypermail 2.1.5 : Tue Sep 14 2004 - 22:04:20 CDT