From: William Overington (WOverington@ngo.globalnet.co.uk)
Date: Mon Apr 21 2003 - 07:56:24 EDT
>Hi All,
>
>Can anyone please let me know why there is no
>equivalent Unicode value for the character &fjlig (small f j ligature) of
iso-pub character set ?
>
>Regards,
>Sourav
>
As far as I know this is because there is a desire to avoid using the code
in stored documents as that would then cause problems with spell checking
software, with software which searches for sequences of characters and for
software which places words into dictionary order.
For example, suppose that the fj ligature were encoded in Unicode at some
value such as U+FBXY for some values of X and Y where X and Y are each a
hexadecimal character. I include the U+FB.. part because seven such
ligatures, for ff, fi, fl, ffi, ffl, long s t and st are included in that
block, yet their use is discouraged for the reasons that fj is not included.
Why are some ligatures included yet not others? It appears to be a
historical legacy matter, some were included then a decision was made not to
include any more. This matter of ligature characters has recently been
discussed again, because I had raised the matter and had asked for it to be
considered, and a decision made by the Unicode Technical Committee.
http://www.unicode.org/consortium/utc-minutes/UTC-092-200208.html
For example, normally the sequence fj is U+0066 U+006A. So a word such as
fjord is encoded as U+0066 U+006A U+006F U+0072 U+0064 using five Unicode
characters. If U+FBXY were used for an fj ligature, then the word fjord
would be stored as U+FBXY U+006F U+0072 U+0064 using five Unicode
characters. So searching for a word like fjord in a document would need a
search for both formats.
Suppose, however, hypothetically, that one is transcribing an old printed
book into a computer system and in some places individual f and j type sorts
have been used and in other places an fj ligature has been used and one
wishes to preserve the information about how the original book was printed
in the computer transcription. I have no knowledge as to whether such a
book with a mixed way of printing fj exists, I am simply suggesting a
scenario for a thought experiment about encoding.
In that circumstance one could use U+0066 U+200D U+006A to explicitly encode
an fj ligature. The U+200D character is the ZERO WIDTH JOINER.
One then has issues with displaying such text correctly. One could either
use a platform which supports an advanced format font type such as OpenType
together with a font which recognizes U+0066 U+200D U+006A as an fj ligature
yet does not process U+0066 U+006A into becoming an fj ligature or one could
preprocess the incoming text stream from the filing system so that a
sequence such as U+0066 U+200D U+006A is converted, for purposes of font
access only, into a Private Use Area character which accesses a glyph for an
fj ligature. For example, the eutocode typography file format mentioned in
the following web page could be used for such a purpose.
http://www.users.globalnet.co.uk/~ngo/ast03300.htm
One would need a font which supports such a Private Use Area character.
However, one would not be obliged to use one of those computer systems which
can support advanced format fonts.
Suppose, however, in a different scenario, that one is not wishing to
produce an archive or a scholarly recording of an old printed book, but
simply wishing to produce a poster for an entertainment called "Music of the
fjords", where a band from Norway is to play their music, where one simply
wishes to key characters and produce a printed poster which looks stylish.
Supposing that one has a desktop publishing package which can accept Unicode
characters, a Unicode compatible way to achieve the result would be to use a
Private use Area character for the fj ligature, together with a font which
implements that ligature within the Private Use Area.
I know that the Code2000 font produced by James Kass has an fj ligature
encoded within the Private Use Area. There may be other fonts which include
an fj ligature encoded within the Private Use Area. There may also be
non-Unicode fonts which include an fj ligature encoded somewhere between
hexadecimal 00 and hexadecimal FF, probably somewhere in the range
hexadecimal 80 to hexadecimal FF.
Some time ago I produced a collection of Private Use Area encodings for
ligatures, introduced and indexed from the following web page. I called
this the golden ligatures collection.
http://www.users.globalnet.co.uk/~ngo/golden.htm
The following web page has an fj ligature at U+E70B.
http://www.users.globalnet.co.uk/~ngo/ligature.htm
The following web page has an ffj ligature at U+E773.
http://www.users.globalnet.co.uk/~ngo/ligatur2.htm
I emphasise that the use of these particular code points for these ligature
characters is not part of the Unicode Standard and that the use is not an
exclusive use of those code points. However, the collection is a consistent
set and if people making fonts which include an fj ligature glyph choose to
have that ligature glyph accessible as character U+E70B then that is a
choice which is open to them to make, though they are perfectly entitled to
make a different choice if they so choose.
Such a choice need not necessarily only be made in relation to fonts where
the intended use is for an fj ligature glyph to be accessed directly as a
Private Use Area character. It may be that someone is producing an advanced
format font, such as an OpenType font and including an fj ligature glyph
within it, for access using a character sequence such as U+0066 U+006A or
U+0066 U+200D U+006A and could then, if he or she so chooses, then, in
addition, as a secondary matter, also map the glyph to a Private Use Area
character, so that people with equipment which cannot access the glyph using
a sequence of characters can nonetheless display it using a Private Use Area
code point.
The code U+200C ZERO WIDTH NON-JOINER is mentioned for completeness, as it
can be used to force a situation that a ligature should not be used.
On the matter of whether an OpenType font would process a sequence to
produce an fj ligature, there was some discussion in this group a while ago
about the possibility of a convention as to how an OpenType font might or
might not process a sequence of characters into a ligature glyph, if the
font actually contained the ligature glyph within it, depending upon the
wishes of the author of the document which is to be displayed.
John Hudson made a specific suggestion in the thread "Proposal: Ligatures w/
ZWJ in OpenType" of Saturday 6 July 2002. It would be interesting to know
whether Mr Hudson's suggestion has been widely taken up. The suggestion is
a comprehensive opportunity to allow a document to specify any one of the
following to an OpenType font, for each potential ligature occurrence.
1. Use a ligature.
2. Do not use a ligature.
3. Use a ligature at your discretion.
Mr Hudson uses fj and ffj as two of the examples within his document.
It would be interesting to know whether Mr Hudson's suggestion has been
widely taken up.
My own font making adventures do not extend to OpenType at this time, I am
using the Softy shareware program to get started in font making.
Some of my initial fonts are available at the following web page.
http://www.users.globalnet.co.uk/~ngo/font7001.htm
Those are all small specialist fonts. My preparing of a Unicode compatible
letters font, Quest text, is proceeding and I have now produced all of the
26 uppercase and the 26 lowercase characters, digits and some punctuation
characters so that the font displays without any "not defined" glyphs
appearing on the page when the basic PC font viewer is used to display a
font synopsis. In addition I have produced AE, thorn and eth in both
uppercase and lowercase and a lowercase long s and a character for U+FFFD.
I am hoping to add the yogh character, U+021C and U+021D for uppercase and
lowercase, when I learn more about it as to how it should line up both in
uppercase and lowercase with respect to other uppercase and lowercase
letters. Quest text is intended to be a font which has some of the rarer
characters which are often not supported in Unicode compatible fonts other
than in general full Unicode fonts, such as characters for Old English and
for Esperanto. I am hoping to add Private Use Area characters for various
ligatures, including the fj and ffj ligatures, mapped as in the golden
ligatures collection above. Quest text is thus a font intended for niche
uses, though with the flexibility to have a few extra characters added
whenever the need arises. However, it could be used as a display font for
ordinary English text as it has a distinctive look which might be quite
useful in some circumstances, such as providing signs which combine style
with clarity. I am hoping to be able to learn how to make OpenType fonts in
due course.
William Overington
21 April 2003
This archive was generated by hypermail 2.1.5 : Mon Apr 21 2003 - 08:39:21 EDT