RE: Latin ligatures and Unicode

From: Michael Everson (everson@egt.ie)
Date: Tue Dec 28 1999 - 08:45:31 EST


Ar 20:46 +0000 1999-12-27, scríobh Marco.Cimarosti@icl.com:
>Michael Everson wrote:
>
>>I don't think so. The ZWJ does something else. It's a subtle difference,
>>but it is a good one.
>
>I think the details will come with your paper, so I wont ask you to do the
>work twice. I hope it will be a public paper for all we outsiders to peep
>at.

Well, I was not planning on redefining ZWJ, which is well-defined. I am
defining ZWL.

>As and end-user, I don't want to lose my time to input any information that
>the software can infer. It is a different matter when the software cannot
>infer it, or when its inference does not comply with *my* rules. In those
>cases I can afford to use a little bit of my time to fix it. When I write in
>English (whose hyphenation rules are a mystery for me!), I want the software
>to properly hyphenize for me.

Then you must wish to be limited to using large economically-successful
languages where 1) such rules exist and 2) people make software to support
them.

>Similarly, every system should use rules to produce ligatures, when these
>rules exist, and only ask me to bother about it when these rules don't exist
>or don't satisfy me.

Only a very few ligatures can be said to be "universal" and rule-governed.
As I pointed out in N2141, even "fi" isn't universal e.g. in Turkish and
Azerbaijani.

>I am not sure who "we" refers to... If you mean the Unicode Consortium, my
>understanding of the Indic encoding must be completely wrong, because it
>seems to me rather similar to what I described: I just add the viramas (i.e.
>phonetic information!, that simply indicates that the inherent "a" vowel is
>not to be pronounced) and the software also uses this information to
>correctly choose ligatures or contextual glyphs.

The presence of VIRAMA between two Brahmic letters selects a particular
glyph cell in the font. That is what ZWL does. The phonetics of the letters
is irrelevant.

>Adding purely graphical information as "ligate these two letters" only make
>sense when this is really a purely graphical choice, with no other meaning.

In Fraktur ligature selection may be purely graphical, or it may
distinguish meaning. The point is that we need a consistent mechanism for
doing this and what we have now is ad-hoc.

>I understood that your examples with the Runic scripts where of this kind:
>"using or not a ligation has absolutely meaning here, but I want anyway to
>show you exactly what the scribe wrote on that inscription, because this
>information is relevant for us".

That is one use of ligature display, yes. It is a particular use which is
not supported productively and usefully yet.

>And indeed, when we are dealing with extinct languages, or with texts that
>may possibly contain hidden messages, we cannot be totally sure that what
>seems to be an arbitrary graphical choice isn't really a meaningful feature.

I'm talking also about Irish Gaelic texts printed in 1850. Which is recently!

>So it makes sense to have a device to encode the graphic difference, just to
>be as literal as possible. And it makes sense to have it in plain text,
>because a character set is a character set, not a word processor, and it
>should not rely too much on font technologies... Who said that the primary
>thing I want to do with my text is to display or print it, rather than, say,
>store it in a database for doing a statistical research?

I don't care whether the language is extinct or not. But of course I agree
with you. It makes sense to code ligation in plain text, and with ZWL too.

Michael Everson ** Everson Gunn Teoranta ** http://www.egt.ie
15 Port Chaeimhghein Íochtarach; Baile Átha Cliath 2; Éire/Ireland
Vox +353 1 478 2597 ** Fax +353 1 478 2597 ** Mob +353 86 807 9169
27 Páirc an Fhéithlinn; Baile an Bhóthair; Co. Átha Cliath; Éire



This archive was generated by hypermail 2.1.2 : Tue Jul 10 2001 - 17:20:57 EDT