Re: [hebrew] Re: Hebrew composition model, with cantillation marks

From: Philippe Verdy (verdy_p@wanadoo.fr)
Date: Wed Oct 29 2003 - 15:58:47 CST


From: "Peter Kirk" <peterkirk@qaya.org>

> On 29/10/2003 10:46, John Hudson wrote:
>
> > While we're about it, we could propose a spacing, non-breaking ELIDED
> > CHARACTER for use in ketiv/qere where combining marks need to be
> > applied to empty space within a word.
>
> How would this differ from NBSP? Now if it were a right-to-left
> character specifically for RTL scripts, that would help. But failing
> that one can safely use <RLM, NBSP>.

Isn't NBSP neutral for directionality (like SPACE)?
May be the issue comes with word breaks, but the UTR (proposed UTS) defining
text boundaries explicitly states that word breaks should not occur in the
middle of a combining sequence (more exactly in the middle of a grapheme
cluster if we consider hangul syllables). So any diacritic added on top of a
space or NBSP must remain unchanged. What the Text boundaries report does
not say clearly is what breaking category is given to a space or NBSP with
diacritics.

My opinion is that a sequence with a space character and modifiers (category
M) becomes adopts the behavior of "Lo" general category for the purpose of
determining text boundaries. Its category remains neutral for
directionality, unless there's a first diacritic that has a explicit
directionality.

This means that the sequence <NBSP, KETIV> is a "Lo" for text boundaries, it
adopts the directionality of <KETIV>, and its minimum glyphic width becomes
0, its minimum height becomes the x-height of the font used to render it,
and further diacritics are laid out / centered around this zero-width base,
possibly extending the glyph positioning box with the minimum layout box of
each diacritic (note that some diacritics may have their minimum layout box
smaller than their effective bounding box, notably if they create ligatures
or are kerned within the surrounding base characters; this is true for
"double diacritics" whose final layout depends on the next combining
sequence).

It would be equivalent to <SPACE, KETIV>, but I prefer keeping the <SPACE>
free of any diacritic as it has a strongly implied word boundary before and
after it, and a candidate line boundary after it. Also because many
algorithms are set to ignore all SPACEs at end of lines when determining
line widths, including for the full justification of paragraphs, and if
there were diacritics on these spaces, they would be rendered within the
final margin, or not rendered at all. (NBSP does not have this problem: if
it does not fit in the line, or occurs at end of a line, it is still
rendered and its width is taken into account when determining where to
actually insert line breaks.)



This archive was generated by hypermail 2.1.5 : Thu Jan 18 2007 - 15:54:25 CST