Re: Superscript and Subscript Characters in General Use

From: Frédéric Grosshans <frederic.grosshans_at_gmail.com>
Date: Tue, 10 Jan 2017 15:40:39 +0100

Le 10/01/2017 à 12:03, Alastair Houghton a écrit :
> That’s part of it, but I think also that the thread is increasingly verbose and hard to follow.
>
> I still think that the idea of adding U+???? SUPERSCRIPT and U+???? SUBSCRIPT might be worth contemplating; it would seem to provide a good answer to both Marcel’s and the French standards body’s concerns (wrt their proposed new ordinal indicator) while only using up two code points, and it’d be much easier to explain to people that superscripts and subscripts were a presentational matter and that they needed to talk to their font supplier. Plus it would work with existing platform rendering engines provided a font with an appropriate OpenType GSUB table was available.
>
> Does anyone besides Marcel have any input on that idea? Is it worth writing a proposal to add SUPERSCRIPT and SUBSCRIPT?

No! Long story short: encoding the {super,sub}script characters one by
one in unicode is a choice that was made more than two decades ago, and
it is much too late to change this, even if it were a good idea.

One of the major problems of such a proposition is that it would be
incompatible (or ambiguous) with earlier version of unicode, since the
same character, let’s say “³”, could be encoded in two differrent
manners : SUPERSCRIPT + U+0033 DIGIT THREE vs the current U+00B3
SUPESCRIPT THREE, and such things are a big no-no. It was problematic
with accented characters and led to the definition of NFC / NFD
normalization with strict stability policies enforced since the 1990s.

If you would manage to convince the Unicode comity that such an encoding
would fit the plain-text model (good luck with that), without removing
all the previously encoded superscript/modifier letters (it’s
forbidden), you would need to define what happens in the various
normalization models NFC / NFD, and probably a introduce new one (NFE ?
E for exponent), which would be to say the least, a huge architectural
change of the Unicode model, for a modest gain if any.
Received on Tue Jan 10 2017 - 08:41:45 CST

This archive was generated by hypermail 2.2.0 : Tue Jan 10 2017 - 08:41:46 CST