RE: Character identities

From: Kent Karlsson (kentk@md.chalmers.se)
Date: Tue Oct 29 2002 - 12:40:03 EST

Next message: John Hudson: "Re: Unicode plane 14 language tags."

Previous message: Doug Ewell: "Re: Unicode plane 14 language tags."
In reply to: Marco Cimarosti: "RE: Character identities"
Next in thread: Marco Cimarosti: "RE: Character identities"
Maybe reply: starner@okstate.edu: "Re: RE: Character identities"
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ] [ attachment ]
Mail actions: [ respond to this message ] [ mail a new topic ]

Marco,

Standard orthography, and orthography that someone may
choose to use on a sign, or in handwriting, are often not
the same.

And I did say that current font technologies (e.g. OT)
does not actually do character to character mappings,
but the net effect is *as if* they did (if, and I hope
only if, certain "features" are invoked, like "smallcaps").
It would be more honest to do them as character-to-character
mappings though, either inside (which OT does not support)
or outside of the font. Capital A, even at x-height, is not
a glyph variant of small a (even though, centuries ago, that
was the case, but then I and J were the same, and U and V,
et and &, ad and @, ...). But displaying U as V (in effect
doing a character replacement on a copy of the input) would
be ok in a non-default mode (using the "hist" feature, say).
My point here is that that replacement (effectively) should
not be done by default in a Unicode font (see Doug's explanation
for what a Unicode font is, if you don't like mine).

> [...] I never heard that U+0364
> (COMBINING LATIN SMALL LETTER E) is part of the spelling of
> modern German or Swedish.

True (that is not part of modern standard orthography),
but I don't see how that could imply some kind of support
for your (rather surprising and extreme) position.

If (and only if!) the author/editor of the text asks for an
overscript e should the font produce one. It is not up to
the font maker to make such substitutions without request,
either by the author/(human) editor changing the text, or by
the author/editor invoking a non-default font "feature"
(via some higher-level protocol, can't be done in plain text).
The "default mode" (for lack of a better term) would be the
one used, well, by default; e.g. on plain text.

> > Other characters have more glyphic variability
> > (informally) associated with them, like A, but some of them
> > have compatibility variants that have a somewhat more restricted
> > glyphic variability, like the Math Fraktur A in plane 1.
>
> More *symbol* characters which escape the general rule.

Math Fraktur A is a letter (of course!). Many letters,
including ordinary A, are used as symbols too.

You seem to argue that for "symbols" (whichever those are,
I'm sure you *don't* mean general categories S*...) there is
total rigidity, while for "non-symbols" (whichever those are)
there is near total anarchy and font makers can change glyphs
to something entirely different.

I claim that there are no characters for which there is total
anarchy (except possibly for "view invisibles" of normally
invisible characters), but that there are several degrees
of flexibility (I'm sure someone can list more than three,
but here is a coarse division):

        1. glyph (almost) fixed: Dingbats, estimated sign, ...
           [could possibly be given a rugged look, or texture
           if you want to mimic e.g. a typewriter look]

        2. "abstract" glyph is fixed but there can be minor
           shape variations: diacritics, math symbols (Sm),
           "math letters" (there are several Math Fraktur
           designs, several Math sans-serif designs, etc.
           that could suit), Arabic presentation forms (initial/
           medial/final/isolated decided but other aspects are
           not fixed, maybe this case is between 2 and 3), ...

        3. fairly free as long as (some) readers recognise
           the character from the glyph (modulo compatibility/
           canonical variants and what should have been
           compatibility/canonical variants...): "nominal"
           digits/letters/punctuation, ... [This, however,
           does NOT allow, e.g., the One Thousand C D character
           to be shown with an M glyph, nor display € as EUR, ...
           in a Unicode font in...; if it did so in default mode
           ["by default"], it would not be a Unicode font.]

[4. Near anarchy; you seem to argue that a large part of
case 2 and all of case 3 fall here...]

Yes, you can have glyphic variation, but for the diacritics
there is (by design, but maybe not sufficiently explicit
stated in the book) a limit to how much it can vary ("in
default mode"). There are limits also for, e.g., 'nominal'
letters and roman numeral characters, that are (by design)
somewhat less constrained. In addition you may note that
those who asked for the inclusion of overscript e does not
regard an overscript e glyph to be an acceptable way
of displaying a diaeresis [in a Uni..., you know].

These things come up quite often in discussions about
proposals to add characters, even though it is not formally
stated. If some of the "Unicode elders" care to elaborate,
please feel free.

Marco, I'm not sure it is of any use to try to explain in
more detail, since you don't appear to be listening. However,
I think I, Marc, Doug, and Mark (at the very least) seem
to be in approximate agreement on this (at least, I have
yet to see any major disagreement). I'm sure Michael
would agree too (at least I hope so), and many others.

Marco, please calm down and reread every sentence of my
previous message. You seem to have misread quite a few things,
but it is better you reread calmly before I try to clear
up any remaining misunderstandings.

Kind regards
/kent k

Next message: John Hudson: "Re: Unicode plane 14 language tags."
Previous message: Doug Ewell: "Re: Unicode plane 14 language tags."
In reply to: Marco Cimarosti: "RE: Character identities"
Next in thread: Marco Cimarosti: "RE: Character identities"
Maybe reply: starner@okstate.edu: "Re: RE: Character identities"
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ] [ attachment ]
Mail actions: [ respond to this message ] [ mail a new topic ]

This archive was generated by hypermail 2.1.5 : Tue Oct 29 2002 - 13:28:14 EST