Support for Latin ligature IJ

Philippe Verdy verdy_p at
Thu Mar 31 09:40:37 CDT 2016

2016-03-31 6:04 GMT+02:00 Marcel Schneider <charupdate at>:

> On Wed, 30 Mar 2016 23:42:20 +0200, Philippe Verdy  wrote:
> > > In such as case, the "ij" letter is soft-dotted also in Dutch and the
> two dots disappear when it has diacritics above.
> > >
> > > For Lithuanian, the "ij" letter is not soft-dotted, but effectively
> hard-coded (meaning also that it is really a ligature, and that the
> single-letter should not be used at all, but encoded as i+j with a possible
> joiner...). In such a case, using the single letter "IJ/ij" meant only for
> Dutch is also an orthographic fault. But this also means that when you add
> diacritics in Lithuanian, you'll need to encode explicit dots (like in
> Turkish) to keep these dots !
> The oopsie is that in some implementations, this way you get two stacked
> dots plus the other diacritic…
> We can only hope that this is now fixed.

True, but the combining diacritic cannot be the standard dot above (because
they would combine vertically, or because the removal of the implicit dot
by the addition of a single combining dot above would just leave one dot
centered sowere between the two parts of the letter).

May be in this case this should be the diaeresis (so "soft-dotted" could
also apply to the implicit diaeresis...). Well semantically this is not
strictly a diaeresis but two dots above, side-by-side, one over each part
of the letter. But this is not so stupid after all for that specific letter
to consider that these two horizontal dots are the same as a diaeresis.

So let's say we want to add an acute accent above the Lithuanian "ij", we
would encode "ij"+"combining diaeresis"+"combining acute accent" to
explicitly encode the two dots and avoid their removal from the soft-dotted
"ij" caused by the acute accent.

Hmmm... not perfect semantically, but this could work... provided that
fonts correctly interpret "ij"+"combining diaeresis" as meaning it must just
preserve the existing dots over the isolated "ij" instead of dropping them
and placing the dots of the diaeresis at random position over the undotted
"ij", i.e. the renderings of "ij" and of "ij"+"combining diaeresis" is
undistinctable even if they are not canonically equivalent (exactly like in
Turkish for the renderings of isolated "i" and of "i"+"combining dot above"
which should also be undistinctable even if they are not canonically

There's a caveat with the fact that this creates two confusable encodings
for the isolated "ij" (with or without the combining dots). But this is also
true for "i" (with or without the combining dot), or for the isolated "j"
letter is a few other Turkic/Altaic languages.
  - One way to avoid the confusion is in fact to use distinct placements of
the dots (over the Dutch/Lithuanian "ij" letter or over the Turkish "i"
letter) if there's no other diacritic above, and for fonts to use the
standard placement of these dots (same as the isolated letter) **only** if
there's another combining diacritic above.
  - Otherwise, the alternate placement could use larger dots, or dots
slightly shifted horizontally if they are explicitly encoded where they
should not be encoded at all over the isolated letter.

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <>

More information about the Unicode mailing list