Re: Fraktur Ligatures (and ligatures for transcribing 18th Century English books)

From: William Overington (WOverington@ngo.globalnet.co.uk)
Date: Tue May 21 2002 - 07:09:33 EDT

Previous message: William Overington: "Re: Encoding of symbols and a "lock"/"unlock" pre-proposal"
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ] [ attachment ]
Mail actions: [ respond to this message ] [ mail a new topic ]

There has been some discussion of ligatures previously, in an English
context.

As I understand the matter, if Unicode chose to encode the Fraktur ligatures
that you request, then they would not be encoded as Fraktur ligatures as
such, but just as Alphabetic Presentation Forms, so that, say, a "long s I"
ligature would be encoded so that the same Unicode code point would be used
for both Fraktur and for non-Fraktur use of a ligature, such as transcribing
a book from 18th century England.

Having had a look at a web page showing a Fraktur fount, I have come to the
initial conclusion that you are looking for the following ligatures.

ch, ck, ff, fi, fl, ll, long sch, long si, long sl, long s long s, long st,
long s s Eszett, tt, tz.

Please correct me if I have got the list wrong or if there are any other
ligatures that you would like included, so that the list is complete.

The one that I have referred to as "long s s Eszett" I have, except for one
exception, only ever seen in German texts. The one exception is on the
reproduction of the title page of a contemporary edition of a sixteenth
century English play "The Massacre at Paris", in the title. This was as an
illustration in a book about English drama, I have not seen the typography
of the original printed text of the play. The use of that form of double s
in England so surprised (and delighted) me that it has stuck in my mind.

The long s on its own is encoded in Unicode as U+017F and the "long s s
Eszett" as U+00DF.

Unicode currently has the ligatures ff, fi, fl, ffi, ffl, long s t and st as
U+FB00 through to U+FB06.

So, it would appear that Fraktur would need the following added.

ch, ck, ll, long sch, long si, long sl, long s long s, tt, tz.

In relation to ligatures it would be helpful for the transcription of
English printed books of the 18th Century to add the following.

ct, long s b, long s h, long s k.

Also I suspect adding the following would be desirable.

long s long s i, long s long s l.

I am unsure what happened historically as to whether long s f and f long s
ever existed and would seek advice from participants in this forum please.
Also advice as to any other long s ligatures, or indeed other ligatures
generally, that could reasonably be included.

This is a total of possibly seventeen extra ligatures at present, at least
thirteen and maybe more than seventeen.

As I understand it, the Unicode consortium and possibly the ISO body are
reluctant to encode any further ligatures.

My suggested solution is that these ligatures be encoded as U+E707 and
following, using the Private Use Area, with ct as U+E707 as I have already
previously suggested that one as an explicit suggestion. The idea behind
this is that U+E707 is chosen so that ct could possibly be promoted to
U+FB07 in time, if the Unicode consortium and ISO so choose. I feel that
keeping open the possibility of a straightforward promotion would be a good
idea, so using U+E708 through to U+E70F for nine of the ligatures would be a
good idea, then continuing from U+E750 through to U+E75F which would provide
for another 16 code points. That would allow 23 ligatures to be added.

So, which code point should represent which ligature?

I suggest that U+E707 be ct as I have already publicly suggested that
previously and some people may have made a note of that.

The rest I suggest could be discussed in this forum with a view to an
interesting experiment to observe whether people might like to agree amongst
themselves a set of Private Use Area encodings which, by the encoding
becoming published on various websites, maybe other people will choose to
use them and a workable set be achieved.

I wonder if I may open the discussion by suggesting that of the
approximately seventeen ligatures that are needed, a possibility would be to
encode all of those that include a long s in the U+E750 through to U+E75F
range and the others in the U+E707 through to U+E70F range. That would,
from my initial list of possible ligatures be six in the range U+E707
through to U+E70F, leaving three unused code points, and eleven in the range
U+E750 through to U+E75F, leaving five unused code points.

This would enable some code points to exist for all of these ligatures, even
though they are only in the Private Use Area and are non-exclusive
definitions. The Unicode Consortium, by its own rules, will not endorse any
allocations in the Private Use Area. If they become widely used, then that
will provide good evidence for them to become promoted to regular Unicode
status. Such promotion, which is in no way automatic, would, if it
occurred, mean that new code point values would be assigned to the
characters, it would not be a matter of saying that the allocation to U+E707
and so on were made into a regular Unicode code point, for that would be
against the laid down rules for the Private Use Area. Using the Private Use
Area is, however, a better choice than it might first appear, for, even if
the Unicode Consortium immediately liked the idea of including Fraktur
ligatures there would still be quite a time lag before the code points were
allocated, so at least using the Private Use Area does have the advantage
that if some of us discuss the idea in this Unicode discussion group for a
few days then, by perhaps next Saturday, a list of code point allocations
can be produced, posted in this discussion group and hopefully published on
a few websites. As time goes on, web search engines will pick up the pages
and so the allocations will be able to be found by anyone who looks up the
word ligature on some of the major search engines.

Another aspect is that this discussion list gets sent to people in many of
the major organizations concerned with typography and computers. One never
knows whether such a list produced by a few interested people in this
newsgroup would be disregarded by major organizations or whether librarians
would carefully print it out and put it into the organization's internal
reference library. I have no direct evidence for it, yet my thoughts are
that any such list will, in fact, without any public comment, be carefully
filed by such librarians, just in case in a year or two someone asks the
librarian if any code points for Fraktur ligatures or other ligatures are
known to be in use. Voila, the list is produced!

Now, certainly, the idea behind suggesting that U+E707 to U+E70F be used is
that promotion to U+FB07 through to U+FB0F would be as straightforward as
possible. The idea behind suggesting U+E750 through to U+E75F is also so
that promotion to U+FB50 through to U+FB5F would be as straightforward as
possible, so that all of the ligatures in the list that I am suggesting that
could be produced by discussion in this newsgroup could possibly be promoted
by adding the same constant to their code point. Now, as it happens, it is
clear from the code charts that U+FB07 through to U+FB0F are presently
unused, but I am unsure as to whether U+FB50 through to U+FB5F are being
used or whether there is some possible other use in mind.

Now, I am aware that the Unicode Consortium cannot endorse any particular
Private Use Area code point and I am not suggesting that allocations in the
Private Use Area effectively reserve space in regular Unicode for "promotion
space" yet, as the possibility of trying to get these characters promoted at
some stage clearly exists and people making a formal proposal can suggest
code points in the proposal, could anyone say what uses, if any, presently
exist or are under consideration within the range U+FB50 through to U+FBFF
please, so that, if there would be a clash over a straightforward promotion,
the suggested range of U+E750 though to U+E75F could be changed within the
next few days thereby keeping open the possibility of a straightforward
promotion by a standard offset rather than immediately having problems due
to lack of forward planning.

In fairness I add that there are issues about encoding ligatures where some
people feel that the ligature should be signalled by having a code between
two ordinary letters indicating that using a ligature is desired. That
approach may well have its merits, yet I do feel that there is scope also to
encode ligature characters separately as they can be very useful for
encoding printed texts from long ago where one wishes to preserve the
typography. Hopefully, people will wish to discuss the issues fully in this
thread and everybody will end up having wide knowledge of the topic.
However, I do feel that these ligature characters should be encoded. It
need only take a little time over a few days and hopefully a result with
long lasting benefits will be achieved.

William Overington

21 May 2002

Previous message: William Overington: "Re: Encoding of symbols and a "lock"/"unlock" pre-proposal"
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ] [ attachment ]
Mail actions: [ respond to this message ] [ mail a new topic ]

This archive was generated by hypermail 2.1.2 : Tue May 21 2002 - 10:26:34 EDT