Re: Ligatured characters

From: John Cowan (jcowan@reutershealth.com)
Date: Thu Sep 07 2000 - 16:15:15 EDT


William Overington wrote:

> Suppose that one is producing a program, such as a Java applet, to display
> pages of printed text, and one wishes to encode text that contains
> ligatures.

Unicode is a plain text encoding standard. Fonts can and should supply the
ligatures which are appropriate to the language variety being displayed.
Thus an "fi" ligature is desirable in English, but forbidden in Turkish.

The character ZWJ (U+200D) may be inserted between two characters to
indicate a desire for ligaturing, and similarly ZWNJ (U+200C) may be
inserted between two characters to indicate a desire for *not* ligaturing.
Since this convention is very recent, however, few if any fonts support
it at this time.

> [H]ave unicode characters been defined for ligatured characters?

For compatibility with existing standards, there are some ligatures
defined in Unicode. Also, in some cases it is language-specific whether
a particular glyph is a ligature; "ae" is a ligature in English but
a letter in Danish. Such cases are encoded as distinct letters in Unicode.

> Is there a control character to
> mean THE NEXT CHARACTER IS A SWASH CHARACTER or something like
> that?

No, that is the provenance of markup systems.

> What about the long s character, how is it represented please?

U+017F. This is considered a distinct letter from "s", since (in Fraktur German,
specifically) its use cannot always be predicted mechanically.

> Am I right in thinking that it is
> permissible to use the private usage area for control commands such as TURN
> LIGATURING ON and TURN LIGATURING OFF rather than as codes of actual
> characters if I so choose?

Characters in the Private Use Area are Humpty-Dumpty-isms: they mean
whatever you want them to mean.

> Also, if using the private use area are there any working
> conventions that have arisen [...]

The only convention is that the low end is generally assigned by users,
the high end by organizations.

-- 
There is / one art                   || John Cowan <jcowan@reutershealth.com>
no more / no less                    || http://www.reutershealth.com
to do / all things                   || http://www.ccil.org/~cowan
with art- / lessness                 \\ -- Piet Hein



This archive was generated by hypermail 2.1.2 : Tue Jul 10 2001 - 17:21:13 EDT