From: Philippe Verdy (verdy_p@wanadoo.fr)
Date: Wed Jul 18 2007 - 11:36:24 CDT
Karl Pentzlin
> Envoyé : mercredi 18 juillet 2007 13:08
> À : Behnam
> Cc : Unicode List
> Objet : Orthographies using ZWNJ (was: Displaying control characters)
>
> Am Mittwoch, 18. Juli 2007 um 12:46 schrieb Behnam:
>
> B> I know of at least two languages that use ZWNJ on the keyboard and
> B> ZWNJ (and ZWJ to a lesser extend) are within text encoding: Persian
> B> and Kurdish (Sorani)
>
> ZWNJ is also needed for German (in advanced typography and when using
> Fraktur), as typesetting rules prohibit visible ligatures e.g. at the
> border
> of constituents of compound nouns. E.g., "Schilfinsel" (island full of
> reed, compound of "Schilf" + "Insel") needs a ZWNJ between the f and i
> to prevent a visible fi ligature there.
Can someone explain the effective difference between WORD JOINER (U+2060)
(that also prohibits ligatures) and ZERO-WIDTH NON-JOINER (U+200C), given
that they are both intended to be zero-width invisible, and they are both
format controls?
I suspect that:
* ZERO-WIDTH NON-JOINER (ZWNJ) is just used to avoid formatting only of
ligatures (i.e. it is just an hint for renderers to help choose between a
ligated non-ligated forms), but it does not mark explicitly that a
syllable-break or hyphenation is prohibited (i.e. it may occur un the middle
of a syllable or at any place in a word where syllable breaks may eventually
occur)
* WORD JOINER (WJ) marks explicitly a syllable break that must not be
ligated because it joins two words (and it is then explicitly a syllable
break candidate by itself).
If this is correct, then WJ is just like a combination of ZWNJ and a sort of
invisible soft hyphen (SHY), it marks a syllable break, except that when a
SHY occurs an effective line-break, SHY transforms into a visible hyphen but
not WJ, and SHY does not prohibits ligatures in words like "effect" where it
would occur encoded as "ef<SHY>fect" and where it should be rendered
"ef-<line break>fect" or as "e<ff-ligature>ect. Note that the presence of a
SHY does not prohibit a ligature here.
So "ef<WJ>fect" would prohibit the ligature and will always be rendered as
"ef<no-ligature>fect" or as "ef<line-break>fect". Same thing in
"dif<WJ>ference" where the two options are possible, but always without
ligatures, and without a visible hyphen when a line-break occurs on a
syllable break.
And "ef<ZWNJ>fect" will prohibit the ligature (as explicitly documented in
the Unicode standard) but will not be explicitly a candidate syllable break
(it should not occur in Latin typography given that the first syllable is
too short with only 2 letters), so it will always be rendered as "ef<no
ligature>fect", unless such break is expected by the author using
"ef<ZWNJ><SHY>fect" and in that case it will be rendered either as
"ef<no-ligature>fect" or as "ef-<linebreak>fect"
If this is something else, which options do we have to explicitly mark
syllable breaks without ligatures, with or without a visible hyphen?
What will happen with joining scripts (i.e. Arabic, Devanagari...) or
cursive styles of alphabetic scripts? Does a prohibition of ligature also
prohibit the usual joining?
Does any of these allow avoiding the aggregation into a Hangul cluster? (I
suspect none of them are designed to control that, given that Hangul has a
specific way to mark syllable breaks between jamos, but this may occur
sometimes between two leading consonant jamos considered as a single
syllable, in historical texts where there may be more than just leading
"double"-consonnants, i.e. SANG-letters)
This archive was generated by hypermail 2.1.5 : Wed Jul 18 2007 - 11:38:44 CDT