From: Jim Allan (jallan@smrtytrek.com)
Date: Tue Oct 29 2002 - 20:53:59 EST
The Old Icelandic character o; (Unicode U+01ED: LATIN SMALL LETTER O
WITH OGONEK) is replaced in modern Icelandic by ö.
Would it be proper therefore to represent U+00F6, the code point which
Marco Cimarosti wants to use for o with circumflex e, also for o with
ogonek?
In Icelandic they could be called the same character. Of course that
only works of Icelandic. We could not use this font for German or
English or French, unless we build some kind of recognition of language
tags into it.
In French the circumflex accent indicates an earlier superscript s over
the vowel. So should we allow combining superscript s as a variant glyph
for the circumflex? But what of French text containing transliterated
Arabic names or Welsh names or transliterated classical Greek names
which use a circumflex which never had such a meaning? Again we would
need language tagging.
The Old English and Middle English letter thorn (þ)is replaced in Modern
English by the combination th. Would it make sense then for a modern
font to represent U+00FE by a glyph showing th? Would it also make sense
to replace the kinds of glyphs used for U+204A TIRONIAN SIGN ET with an
ampersand? The meaning is exactly the same. But what if we want to used
this font for Icelandic or Old English? Do we again need an intelligent
font that understands language tagging?
Do we now have different flavors of Unicocde, one for English, one for
Icelandic, one for French, one for German ... ? What of other languages?
A diaeresis used in the transliterated Classical names Peirithoüs and
Menelaüs is not the same as a superscript e, though in German (and some
other languages) sounds once indicated by supersript e over a vowel
have been replaced by diaeresis over a vowel. If so, then a font which
rendered any dieresis over u or o or a would be incorrect for classical
names cited and also possibly for other foreign names. How would J.R.R.
Tolkien's name Eärendil be rendered by such a font where the diaeresis
indicates separate pronunciation of a, not an umlauted a?
Surely it makes more sense that an author or advertising designer who
wishes to use u with superscript e to use the Unicode method of u
followed by a combining superscript e so that it will appear as desired
in any font rather than by using a font change? Font changes should not
change the orthography or spelling of the original but should represent
transparently what the writer intended, and Unicode gives us a clear way
to distinguish combining superscript e from combining diaeresis and
combining superscript s from combining circumflex.
Using the Unicode method makes far more sense than creating fonts that
work for particular languages only, provided no foreign words or names
appear, or which require language tagging.
In most European languages æ and oe are ligatures at one time commonly
used in names and technical words of Latin origin. Modern stylistic
preference is to avoid these ligatures. However French uses oe for a
particular sound, though the use of that ligature instead of oe was not
considered important enough for oe to be generally available on French
typewriters. Also both diagraphs were separate letters in Old English,
whence the use of æ still in modern Danish and Icelandic. Should this
modern convention be properly indicated in an intelligent font by using
unconnected ae and oe for the these digraphs except where language
tagging indicates Danish, Icelandic, or older Scandinavian use or Old
English? Should we have to language tag Encyclopædia Britannica to be
sure that æ appears in the name properly connected?
In fact, the stylistic conventions are indicated not by font changes or
tagging but by typing the appropriate characters.
Should an English language font render ö as oe, so that Göthe appears
automatically in the more normal English form Goethe?
Marco's desire to use a font to indicate combining superscript einstead
of the way Unicode wants it done seems prompted because currently most
Unicode fonts do not currently support the combinining superscript
characters and he wishes a fallback to normal diaeresis instead of to an
undefined character indicator.
This is a reasonable wish.
In light of current Unicode support, the hack of identifying diaeresis
with combinining superscript e makes sense.
There has never been anything wrong with using a hack when required for
a task at hand. But hacks of this kind that, if followed up widely in
many fonts in many languages, would produce a chaos of interpretations
and numerous fonts only suited for particular languages, filtering the
text and not presenting what is there, without complex and otherwise
unnecessary tagging.
Surely this is not what Unicode should be?
If a writer uses a long s in modern writing, whether quoting text of an
earlier era or purposely being archaic, normal fonts should display a
long s, not a short s on the grounds that it happens long s is not
normally used in modern writing in Antiqua fonts.
If a writer decides between using ü, ue, or u?? (u with combining
superscript e), the font should leave the text alone.
If you have a newer version of the Code 2000 font on your machine which
contains the combining superscripts, then the superscript eappears
correctly in newer browsers, even if you are using a different font for
the base character. A diacritic from one font is placed over the base
character of another.
I can understand Marco not wishing to bother viewers with the demand to
load a particular font and also knowing that dynamic downloading of a
font will not work with every system or browser or with user settings of
browsers. So use the hack for now. In two or three years, hopefully, it
will not be necessary.
Generally a font should not be correcting the text.
The use of macron for dieresis is somewhat a different matter. If a
particular style of German script uses a line for a diaeresis, then
indeed the diaeresis in that script has fallen together in appearance
with the macron. This would be especially so if a diaeresis was used
over e and i (in foreign words and names). Representing diaeresis by a
glyph of macron form would be no more of a hack then would be the use in
an English script font of a p with an ascender, though presumably an
Icelander would identify that as the letter þ, not p. (How þ itself
should be presented in such a script font is problematical!)
The main difficulty with identification of diaeresis and combinining
superscript e is that the identification does not work universally, even
within German, if foreign names or words appear. Even in German text,
combining superscript e may not always correctly replace diaeresis.
Jim Allan
This archive was generated by hypermail 2.1.5 : Tue Oct 29 2002 - 21:46:22 EST