The kind of ligature you are requesting, Adam, would be very bad for your
language in the long run, because you'd never know whether something was
spelled <ch> and when <c><h>. Unicode doesn't choose to code these things
for that reason. Many of the digraphs already there are either from legacy
character sets (such as a lot of Arabic presentation forms as far as I
know) or are for specific odd practices, like those Croatian digraphs which
are there only to give one to one transliteration to Serbian.
Sometimes there may be a real advantage in processing of a ligature is
encoded, even when it is canonically equivalent to a string of other
characters. Mark Shoulson and I believe this is true about the HEBREW
TETRAGRAMMATON. Apparently some Hangul processing algorithms work better
with precomposed syllables (though historical syllables have to be
processed using a different model).
Welsh and Spanish and Irish and English all use the digraph <c><h> to
represent a single sound just as Slovak does. Welsh and Spanish sort them
as separate letters too. But it would be bad to encode <ch>.
-- Michael Everson, Everson Gunn Teoranta ** http://www.indigo.ie/egt 15 Port Chaeimhghein Íochtarach; Baile Átha Cliath 2; Éire/Ireland Guthán: +353 1 478-2597 ** Facsa: +353 1 478-2597 (by arrangement) 27 Páirc an Fhéithlinn; Baile an Bhóthair; Co. Átha Cliath; Éire
This archive was generated by hypermail 2.1.2 : Tue Jul 10 2001 - 17:20:46 EDT