Re: Orthographies using ZWNJ (was: Displaying control characters)

From: Asmus Freytag (
Date: Wed Jul 18 2007 - 12:41:26 CDT

  • Next message: Sinnathurai Srivas: "Re: Generic base characters"

    On 7/18/2007 9:36 AM, Philippe Verdy wrote:
    > Karl Pentzlin
    >> Envoyé : mercredi 18 juillet 2007 13:08
    >> À : Behnam
    >> Cc : Unicode List
    >> Objet : Orthographies using ZWNJ (was: Displaying control characters)
    >> Am Mittwoch, 18. Juli 2007 um 12:46 schrieb Behnam:
    >> B> I know of at least two languages that use ZWNJ on the keyboard and
    >> B> ZWNJ (and ZWJ to a lesser extend) are within text encoding: Persian
    >> B> and Kurdish (Sorani)
    >> ZWNJ is also needed for German (in advanced typography and when using
    >> Fraktur), as typesetting rules prohibit visible ligatures e.g. at the
    >> border
    >> of constituents of compound nouns. E.g., "Schilfinsel" (island full of
    >> reed, compound of "Schilf" + "Insel") needs a ZWNJ between the f and i
    >> to prevent a visible fi ligature there.
    > Can someone explain the effective difference between WORD JOINER (U+2060)
    > (that also prohibits ligatures) and ZERO-WIDTH NON-JOINER (U+200C), given
    > that they are both intended to be zero-width invisible, and they are both
    > format controls?
    > I suspect that:
    > * ZERO-WIDTH NON-JOINER (ZWNJ) is just used to avoid formatting only of
    > ligatures (i.e. it is just an hint for renderers to help choose between a
    > ligated non-ligated forms), but it does not mark explicitly that a
    > syllable-break or hyphenation is prohibited (i.e. it may occur un the middle
    > of a syllable or at any place in a word where syllable breaks may eventually
    > occur)
    It also affects cursive connection, which was it's main reason for encoding.
    > * WORD JOINER (WJ) marks explicitly a syllable break that must not be
    > ligated because it joins two words (and it is then explicitly a syllable
    > break candidate by itself).
    No. Word joiner merely prevents a line break. You don't use it between
    parts of a compound word, since in most scripts, words are kept on the
    same line by default. But you can use it between ideographs, and in
    other situation, where linebreaks would be allowed. It has no effect on
    ligation, other than (likely) to interrupt it, if it isn't filtered
    before applying ligation rules. (As it shouldn't ever need to be present
    where ligation is expected, it's rare to see implementation that filter
    it, but doing so would be correct).

    > If this is correct,
    it's not.
    > then WJ is just like a combination of ZWNJ and a sort of
    > invisible soft hyphen (SHY), it marks a syllable break, except that when a
    > SHY occurs an effective line-break, SHY transforms into a visible hyphen but
    > not WJ, and SHY does not prohibits ligatures in words like "effect" where it
    > would occur encoded as "ef<SHY>fect" and where it should be rendered
    > "ef-<line break>fect" or as "e<ff-ligature>ect. Note that the presence of a
    > SHY does not prohibit a ligature here.

    > So "ef<WJ>fect" would prohibit the ligature and will always be rendered as
    > "ef<no-ligature>fect" or as "ef<line-break>fect". Same thing in
    > "dif<WJ>ference" where the two options are possible, but always without
    > ligatures, and without a visible hyphen when a line-break occurs on a
    > syllable break.
    The preceding is a pure flight of fancy and is best disregarded.
    > And "ef<ZWNJ>fect" will prohibit the ligature (as explicitly documented in
    > the Unicode standard)
    > but will not be explicitly a candidate syllable break
    > (it should not occur in Latin typography given that the first syllable is
    > too short with only 2 letters), so it will always be rendered as "ef<no
    > ligature>fect", unless such break is expected by the author using
    > "ef<ZWNJ><SHY>fect" and in that case it will be rendered either as
    > "ef<no-ligature>fect" or as "ef-<linebreak>fect"
    In German, typeset in Fraktur, contrary to what you write here, you
    would definitely expect the ZWNJ to occur at syllable boundaries. But
    that follows from the rules on ligation, not from the nature of the

    There is no 'syllable delimiter' in Unicode.


    > If this is something else, which options do we have to explicitly mark
    > syllable breaks without ligatures, with or without a visible hyphen?
    > What will happen with joining scripts (i.e. Arabic, Devanagari...) or
    > cursive styles of alphabetic scripts? Does a prohibition of ligature also
    > prohibit the usual joining?
    If you had read the standard, before creating your own alternate
    reality, you wouldn't need to ask that question. The role of ZWNJ in
    joining is explicitly described.
    > Does any of these allow avoiding the aggregation into a Hangul cluster? (I
    > suspect none of them are designed to control that, given that Hangul has a
    > specific way to mark syllable breaks between jamos, but this may occur
    > sometimes between two leading consonant jamos considered as a single
    > syllable, in historical texts where there may be more than just leading
    > "double"-consonnants, i.e. SANG-letters)

    This archive was generated by hypermail 2.1.5 : Wed Jul 18 2007 - 12:43:10 CDT