RE: Latin ligatures and Unicode

From: Gary Roberts (gar@sparc.sandiegoca.ncr.com)
Date: Wed Dec 29 1999 - 14:52:16 EST


Yes. There is definitely an issue of how to accomplish what one wants in
a way that will be implemented. For example, if the solution relies on
language tags (e.g. dictionary based solutions), then it is of little use
if companies don't provide support for your language. On the other hand,
the soft hyphen is generally implemented, and supports languages that
haven't even been invented yet. Now, one could argue whether soft hyphen
is best implemented as markup or as the addition of a new character. I
tend to read and create markup files by hand. My tendency is to prefer
markup when there is some span to the markup. The more characters the
markup is likely to affect, the more I prefer it to adding a character.
Soft hyphen is an example where there is no span at all, and it makes
sense to solve the issue with a soft hyphen character. I see ZWL
as a substitute for markup having a span of two or three characters, which
still makes it attractive as a new character sollution. It also seems
more flexible. Say that I often deal with fonts that have only ligature
pairs, given the choice of ff i or f fi, I always prefer ff i,
but my colleague prefers f fi. We both prefer ffi as a single ligature
if it exists in the font. What markup gives each of us the results we
prefer? For &=ZWL, the answer is f&fi for me, and ff&i for my colleague.
Note that ZWNL is not useful for this case. I can speculate at the
appropriate markup language, but I'd rather hear how others have actually
solved this problem.
                                *

On Wed, 29 Dec 1999, Asmus Freytag wrote:

> What is at the heart of this recurring request is that support for many
> scripts
> (or older typographies) is incomplete without an *interchangeable*
> method of indicating the precesence or absence of ligatures.
>
> Plain text used to be the *only* medium with near universal
> interchangeability. With the web, this has changed. It is now appropriate
> to move this discussion on a higher plane and consider the question
> differently:
>
> What is the best way to interchange text containing ligature on the web?
>
> Posing this question allows us to consider the full-featured typorgraphic
> and aesthetic requirements for ligation - as well as any inherent
> regularities. Once we have a design in place for interchanging ligatures
> with marked up text, we can revisit that and see whether replacing markup
> instructions by character codes gives better results.
>
> I feel we have explored the semantic aspects of this long enough to
> conclude that there is some evidence that a ZWNL is linked slightly more to
> the underlying semantic content of the text than a ZWL, but that for
> neither case we have enough to settle the argument in favor of making them
> characters today.
>
> Both concepts ('ligate here', 'don't ligate here') can in principle be
> expressed with HTML or XML style markup - I have seen too little discussion
> of what this markup should be like, and what the consequences are of it
> being present in the middle of words. Is that something that the HTML/XML
> community wants to deal with?
>
> The next question, assuming that we agree on what ligation commands look
> like in markup, concerns interchange between parts of a program, e.g. text
> processor to rendering engine. Is it meaningful to have character codes at
> that level, or is it more typical that each ligature is it's own little
> style run.
>
> The strongest arguments in favor of character codes come from those who
> have for long time needed to 'trick' various applications into supporting
> languages
> that they were not explicitly designed for. If character codes would result
> in 'enabling' many of these implementations, by letting the author
> communicate with the rendering engine, so to speak, that is itself a valid
> argument to consider. (It would need some actual case studies where this
> approach is shown to work).
>
> Still, even that would need to be contrasted with the cost to applications
> that do not know about these as characters and end up showing 'boxes'.
>
> A./



This archive was generated by hypermail 2.1.2 : Tue Jul 10 2001 - 17:20:57 EDT