From: Kent Karlsson (kentk@md.chalmers.se)
Date: Thu Oct 31 2002 - 09:03:44 EST
Let me take a few comparable examples;
1. Some (I think font makers) a few years ago argued
that the Lithuanian i-dot-circumflex was just a
glyph variant (Lithuanian specific) of i-circumflex,
and a few other similar characters.
Still, the Unicode standard now does not regard those as
glyph variants (anymore, if it ever did), and embodies
that the Lithuanian i-dot-circumflex is a different
character in its casing rules (see SpecialCasing.txt).
There are special rules for inserting (when lowercasing)
or removing (when uppercasing) dot-aboves on i-s and I-s
for Lithuanian. I can only conclude that it would be
wrong even for a Lithuanian specific font to display an
i-circumflex character as an i-dot-circumflex glyph,
even though an i-circumflex glyph is never used for
Lithuanian.
2. The Khmer script got allocated a "KHMER SIGN BEYYAL".
It stands (stood...) for "any abbreviation of the
Khmer correspondence to etc."; there are at least four
different abbreviations, much like "etc", "etc.", "&c",
"et c.", ... It would be up to the font maker to decide
exactly which abbreviation, and would vary by font.
However, it is now targeted for deprecation for precisely
that reason: it is *not* the font (maker) that should
decide which abbreviation convention to use in a document,
it is the *"author"* of the document who should decide.
Just as for the Latin script, the author decides how to
abbreviate "et cetera". The way of abbreviating should stay
the same *regardless of font*. Note that the font may be
chosen at a much later time, and not for wanting to
change abbreviation convention. That convention one
may want to have the same throughout a document also
when using several different fonts in it, not having to
carefully consider abbreviation conventions when choosing
fonts.
3. Marco would even allow (by default; I cannot get away
from that caveat since some (not all) font technologies
do what they do) displaying the ROMAN NUMERAL ONE THOUSAND
C D (U+2180) as an M, and it would be up to the font
designer. While the glyphs are informative, this glyphic
substitution definitely goes too far. If the author
chose to use U+2180, a glyph having at least some
similarity to the sample glyph should be shown, unless
and until someone makes a (permanent or transient)
explicit character change.
4. Some people write è instead of é (I claim they cannot
spell...). So is it up to a font designer to display
é as è if the font is made for a context where many
people does not make a distinction? Can a correctly
spelled name (say) be turned into an apparent misspelling
by just choosing such a font? And that would be a Unicode
font?
5. I can't leave the ö vs. ø; these are just different
ways of writing "the same" letter; and it is not
the case that ø is used instead of ö for any
7-bit reasons. It is conventional to use ø for ö
in Norway and Denmark for any Swedish name (or
word) containing it. The same goes for ä vs. æ.
Why shouldn't this one be up to the font makers too?
If the font is made purely for Norwegian, why not
display ö as ø, as is the convention? This is
*exactly* the same situation as with ä vs. a^e.
I say, let the *"author"* decide in all these cases, and
let that decision stand, *regardless of font changes*.
[There is an implicit qualification there, but I'm
tired of writing it.]
> Kent Karlsson wrote:
> > > I insist that you can talk about character-to-character
> > > mappings only when
> > > the so-called "backing store" is affected in some way.
> >
> > No, why? It is perfectly permissible to do the equivalent
> > of "print(to_upper(mystring))" without changing the backing
> > store ("mystring" in the pseudocode); to_upper here would
> > return a NEW string without changing the argument.
>
> And that, conceptually, is a character-to-glyph mapping.
Now I have lost you. How can it be that? The "print"
part, yes. But not the to_upper part; that is a
character-to-character mapping, inserted between the
"backing store" and "mapping characters to glyphs".
It is still an (apparent) character-to-character
mapping even if it is not stored in the "backing store".
> In my mind, you are so much into the OpenType architecture,
> and so much used
> to the concept that glyphization is what a font "does", that
> you can't view the big picture.
Now I have lost you again. Some fonts (in some font
technologies) do more that "pure" glyphization. This
is why I have been putting in caveats, since many people
seem to think that all fonts *only* do glyphisation,
which is not the case.
But to be general I was referring to such mappings regardless
of if that is built into some font (using character code points
or, as in OT/AAT, using glyph indices) or (better) were external
to the font.
I was trying to use general formulations, but I cannot
avoid having caveats for certain mappings that certain
technologies do (since those are so popular). But I would
agree that those particular forms of mappings *should not*
be done by fonts (but they are), and instead be done
externally of the fonts (even when transient, as part
of the "rendering"). An advantage would be that if
a particular (named) mapping was asked for (to_upper say),
it would be done the same way regardless of which font
is chosen. But alas...
Kind regards
/kent k
This archive was generated by hypermail 2.1.5 : Thu Oct 31 2002 - 09:57:02 EST