From: verdy_p (verdy_p@wanadoo.fr)
Date: Tue Dec 15 2009 - 06:51:40 CST
"Jeroen Ruigrok van der Werven" wrote:
> Actually ij is unbreakable from a language point of view. You cannot
> hyphenate any words using it like blijdschap into bli-jdschap. I think the
> Dutch problem of using ij/IJ/y/Y for the ij comes from the fact we have been
> using US English keyboards for a long time now.
Wasn't it "ÿ/Y" rather than "y/Y" (keeping the two soft dots on the lowercase letters part of that ligature, as they
are present in the handwritten form) ?
And isn't there also "Ÿ" due to the default case mapping of "ÿ" in Unicode (thus transforming the two soft dots into
hard dots), despite "Ÿ" was not available in ISO 8859-1?
And wasn't there a recent orthographic reform in Belgium or the Netherlands to allow writing words without the
ligature (even if there's no hyphenation, that ZWJ would anyway prohibit if it is Dutch-aware, so i+ZWJ+j could
still be perfect for Dutch, suggesting that this is an optional ligature)
Are there cases in some languages, where letters linked with ZWJ to form a suggested ligature can still be separated
by hyphenation? Yes:
In the purely typographical ligatures such as "ff"/"ffl"/"ffi" where the suggested ligature between the two "f" is
still possible, and even in the case of "ffi"/"fi"/"ffl"/"fl" where an syllable break is still possible after the
last "f" (in English or German, but not in French), or "ct"/"st"/"ſt" where syllable breaks exist in most latinized
languages.
There are other common ligatures: "ſl"/"ſi".
In handwritten forms or in cursive typography, the links also have contextual positions and can even alter the glyph
of the preceding or following letters (notably with lowercase letters that have descenders, but also some pairs like
"bi" or "oi" or "ol", where the "i" or "l" connects on the x-height line instead of the baseline). These are
generally not real ligatures, but contextual forms (mostly like in Arabic where they are required), some of them
depending on whever that letters are initial or not, others depending whever they are final or not, and some
depending on specific pairs.
ZWJ is then not an indicator of word hyphenation positions, but this is also true for all other letters. These
positions are simply not encoded in Unicode: by default, if you don't know the language precisely, there's no
hyphenation at all in the middle of words, except if you use the soft hyphenation (SHY) control explicitly. For
effectively applying hyphenation you need to know the language, its general rules, and a dictionnary for lists of
exceptions:
The presence or absence of ZWJ in ligatures (or of ZWNJ in non-ligatures) will have no effect (hyphenators should
ignore ZWJ and ZWNJ when they are present, and MUST effectively drop them before rendering lines if an hyphenation
occurs at their position.
This just demonstrate that it's up to renderers to manage ZWJ and ZWNJ correctly, and not up to fonts to include
support for them. Fonts on the opposite MAY map glyphs for them, which will be displayed only when using the "show
hidden controls" mode (in which ligatures suggested by ZWJ should probably NOT be honored).
When this mode is off, none of these "visible control" glyphs should ever be displayed, even if they are mapped in
the main mapping table (codepoint to glyphID), and renderers should just use or not use the ligatures found in the
font's "ligature" tables (which may also be tuned for some languages where they are forbidden): in OpenType, this
means using font "features" and putting all substitution/positioning rules within the features table.
But anyway, I still think that it's best to use the glyphs mapped in the font for ZWJ and ZWNJ, instead of a generic
glyph from a default internal font of the renderer, notably because the glyph forced by the renderer could have very
bad metrics, not aligning correctly (possibly contextually) with the other glyphs coming from the font: a font may
then still need to use substitution/positioning rules to properly position (and possibly reorder...) the control
glyph within the text flow, when the "visible controls" mode is enabled in the renderer.
Philippe.
This archive was generated by hypermail 2.1.5 : Tue Dec 15 2009 - 06:55:00 CST