Re: Digraphs as Distinct Logical Units

From: William Overington (WOverington@ngo.globalnet.co.uk)
Date: Sat Aug 03 2002 - 04:32:54 EDT


Sean Palmer raises some interesting matters.

>I came across the following point in the Unicode FAQ that explains why the
>Unicode standard does not contain any characters for digraphs:-
>
>http://www.unicode.org/unicode/faq/ligature_digraph.html#3
>

Is a digraph exactly the same as a ligature, or is there some difference
please?

>In all practicality, I do not expect that writers in languages such as
>Spanish, Hungarian, and Welsh etc. where digraphs are used fairly commonly
>would immediately change all their texts to use the appropriate single
>Unicode characters, had they existed. Of course, it is also true that a
>decade or two ago, common substitutions such as i^ for i-with-circumflex
>had to be made. Since then, these characters appeared in various character
>"sets", and whilst it's still fairly common to leave the diacritics off,
>there are a great deal more people using the proper characters.

Could someone possibly state some examples of the use of digraphs in
Spanish, Hungarian and Welsh please? My thinking is that this would be of
interest to those people who are interested in typography and Unicode yet
who may not be linguists (though have an interest in languages and have used
a language other than their own language as part of their general education)
and would provide some documented examples of what is being discussed.

>
>Since there are 676 possible digraph combinations, I endeavoured to come up
>with a simpler approach to marking the digraphs as a single character than
>simply creating a codepoint for each one. I have two ideas so far:-
>
>* Come up with a set of A-Za-z combining characters, such that c +
>combining-h would form a "ch" grapheme

Well, regular Unicode has the ZWJ operator, ZERO WIDTH JOINER, U+200D, which
is used between the characters. Thus the sequence of three code points c
ZWJ h would indicate the use of a ch ligature for a display. I am not
congruently certain as to whether this is the end result which you are
seeking as that depends upon whether a digraph is congruently the same as a
ligature, which is something about which, as I write this, I am uncertain.
The ch ligature is interesting because the code point could be used both for
German Fraktur and also 18th Century style English printing, though the
glyphs would be very different.

>* Come up with a digraph combinging character, such that c + h +
>digraph-combinging-character forms the "ch" grapheme

Actually, I have suggested some code points in relation to this in the
following document. However, I put the operator before the characters upon
which it operates.

http://www.users.globalnet.co.uk/~ngo/courtlas.htm

If looking at that for the first time, it is helpful to look at the index to
the set of documents as well.

http://www.users.globalnet.co.uk/~ngo/court000.htm

>If anyone has any comments on this, or any references to previous
>discussions, they would be gladly recieved.
>

You might like to have a look at the following.

http://www.users.globalnet.co.uk/~ngo/golden.htm

That is an introduction and index to various documents which I have
published on the web about code points for ligature characters, including
some specific code point allocations for various ligature characters within
the Private Use Area. I have named this set as the golden ligatures
collection. Please know that my use of these code points for these
allocations is not an exclusive use of those code points, anyone can use or
publish their own code point allocations for Private Use Area codes, and
indeed, if they so choose, their own allocations of Private Use Area codes
for ligatures. Please know also that the allocation of those code points
for ligature characters is not, how shall I put it, universally appreciated,
yet nevertheless I feel that publishing a set of such code point allocations
is a useful and worthwhile activity. The main reasons for my liking to
publish code points for ligatures are that not everyone has computer
equipment which can handle the advanced font formats which are needed to
handle the ZWJ sequences and that there is a certain artistic aspect to
having code points for the individual sorts of metal type which were set
into composing sticks by printers long ago, and, in hobbyist circles, today.
I have had considerable fun trying to produce some experimental fonts using
such code points as U+E707 for a ct ligature and so on, indeed, by doing so
I learned a lot about manipulating contours in a glyph design, though I
emphasise that that was only because I am a beginner at learning how to
produce fonts. A useful feature of the published lists is that font
designers who are using the ZWJ method in advanced format fonts may, if they
choose, also provide an alternative direct access route to the glyphs if
they so choose using the code point allocations of the golden ligatures
collection. Naturally they need not provide any direct access route if they
do not wish to do so, and if they do provide a direct access route they are
in no way obligated to use the golden ligatures collection code points for
the direct access route. Also, the golden ligatures collection does not
provide code points for all of the ligatures that might be needed by a font
designer, however, if anyone does want code points for some other ligatures
then I will be interested to try to add them into the golden ligatures
collection upon request.

In the mail list archive at http://www.unicode.org there are various
discussions about ligatures. Recently there was some discussion about the
golden ligatures collection and about a rather fun occurrence which is
archived as The Respectfully Experiment.

William Overington

3 August 2002

http://www.users.globalnet.co.uk/~ngo



This archive was generated by hypermail 2.1.2 : Sat Aug 03 2002 - 02:48:05 EDT