From: Philippe Verdy (verdy_p@wanadoo.fr)
Date: Sun Jul 22 2007 - 13:47:18 CDT
Asmus Freytag wrote:
> > If this is something else, which options do we have to explicitly mark
> > syllable breaks without ligatures, with or without a visible hyphen?
> >
> > What will happen with joining scripts (i.e. Arabic, Devanagari...) or
> > cursive styles of alphabetic scripts? Does a prohibition of ligature
> also
> > prohibit the usual joining?
> >
> If you had read the standard, before creating your own alternate
> reality, you wouldn't need to ask that question. The role of ZWNJ in
> joining is explicitly described.
You don't need to rant about my reading of the standard. I have said in my
message that ZWNJ was used to control ligation/joining during rendering. I
spoke about something else.
You affirm that Unicode does not encode syllable breaks but it's completely
wrong. SHY is a perfect example of an explicit syllable break.
I was speaking about the effect of combining or detaching the effect of
syllable breaks and ligatures. My question is still not answered.
What I have seen is that the presence of a word joiner really prevents a
ligature, although it is not specified anywhere; and if it is used as an
invisible syllable break (which will never be rendered as a hyphen if a line
break occurs) for compound words that are normally not separated by space or
hyphen, but that may still be split if needed on line boundaries, I think
it's normal that it prevents the formation of a ligature.
Now the question remains: what is the effective difference between WJ and
ZWNJ? I can't see any, both on the morphological analysis side, and on the
rendering side.
If WJ is not expected to break a ligature, this should be specified so that
ZWNJ will be used explicitly to control that (WJ will still be used to
control word breaks, mostly in scripts that have no required word separation
by spaces or other punctuation marks)
I saw this concern when replying to the message sent by Karl Pentzlin
speaking about the compound word "Schilfinsel" (i.e. "Schilf" + "Insel"
without a "fi" ligature), that he wants to encode as "Schilf<ZWNJ>insel",
where the absence of ligature is expected to really mark the internal
syllable break.
German compound words (in my opinion) contain mor than just a rendering hint
(ZWNJ) and WJ is certainly more significant to say that. So there are two
situations when an author is tuning the rendering of the text and uses a
hyphenation algorithm to mark explicitly where syllable breaks will occur:
(1) Either the syllable break is wanted and expected here, so he
will insert a SHY between the two parts of the word; but SHY still does not
prevent a ligature, so he will need BOTH ZWNJ (against the "fi" ligature)
and SHY after it: the resulting string will be "Schilf<ZWNJ><SHY>insel";
(2) Or the syllable break is not desired, and WJ will be used to say
that explicitly (preventing an automated hyphenator to insert a line break
here), but as WJ does not prevent the ligature (it is not specified, but
this ligature avoidance is still occurring with most renderers), so he will
need to encode BOTH ZWNJ (against the "fi" ligature) and WJ after it (to
disable any hyphenating line break): the resulting string will be
"Schilf<ZWNJ><WJ>insel", rather than just "Schilf<WJ>insel".
I am not inventing things. This is a "grey area" where something is not
clearly specified, and due to the current implementations, I still see no
clear difference between the effects of ZWNJ and WJ and how to use them, and
what they are effectively preventing or enforcing. If WJ should effectively
prevent a ligature, then it should be specified (and using ZWNJ in the
alternative (2) above will NEVER be needed)
My message had NOTHING that would let someone think that it was a
"recommendation" or interpretation. You should have read it as a QUESTION
left to discussions.
This archive was generated by hypermail 2.1.5 : Sun Jul 22 2007 - 13:49:49 CDT