Re: Character proposal: SUBSCRIPT TEN

From: Kenneth Whistler (kenw@sybase.com)
Date: Wed Jan 16 2008 - 15:42:43 CST

Next message: Asmus Freytag: "Re: Latin J capital letter with caron"

Previous message: Kenneth Whistler: "Re: Latin J capital letter with caron"
Maybe in reply to: Leo Broukhis: "Character proposal: SUBSCRIPT TEN"
Next in thread: Leo Broukhis: "Re: Character proposal: SUBSCRIPT TEN"
Reply: Leo Broukhis: "Re: Character proposal: SUBSCRIPT TEN"
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ] [ attachment ]
Mail actions: [ respond to this message ] [ mail a new topic ]

> The SUBSCRIPT TEN (== decimal exponent base) character remains the
> only character that is present in the GOST-10859 standard
> (http://en.wikipedia.org/wiki/GOST_10859) and in the character set of
> the Soviet ACPU-128 drum printer but is absent is Unicode.
> Also see the the German character set ALCOR
> (http://en.wikipedia.org/wiki/ALCOR).

GOST 10859 and ALCOR were effectively dead encodings long before
Unicode even got started collecting repertoire, and were not
considered among the important initial set of character encodings
from which compatibility characters were culled for one-to-one
mapping in Unicode. If they had been, no doubt a subscript
numeral 10 character would have been included in Unicode 1.0,
along with all the square Japanese abbreviations, for example.

> It cannot be replaced by SUBSCRIPT ONE + SUBSCRIPT ZERO, because it
> has to occupy one character position for the sake of text aligned for
> a fixed-width font.

That's debatable. For transcoding obscure character encodings,
there really is no requirement that you have one-to-one
mappings for every character. You can certainly represent
the subscript 10 in GOST 10859 with <2081, 2080> in Unicode
and convert it back losslessly with no problem.

> What should an emulator of a computer that used GOST 10859 or ALCOR
> produce, then?

For an emulator you would have various options, including
mapping of the sequence <2081, 2080> to your fixed-width
ACPU-128 drum printer font glyph for a subscript 10. Or,
if your emulator is making one-to-one character to glyph
assumptions, then you use a PUA value to stand in for the
sequence, and map *that* to your fixed-width glyph.

> Is there a chance for this character (or a way to request halfwidth
> subscript/superscript characters) to appear in Unicode?

Of course. See:

http://www.unicode.org/pending/proposals.html

Unlike what Philippe suggested, there isn't any requirement
that such a proposal come in from the German National Body
for ALCOR or the Russian National Body on behalf of an old
Soviet standard. If it *did* come from a national body, the
proposal would no doubt carry more weight in WG2 for
consideration for ISO/IEC 10646, but in any case such
sponsorship is not required to simply add one more symbol
for compatibility with an old encoding to the standard.

However, justification in terms of emulation of long unused
character sets and computing machinery isn't a very strong
case, since emulation software is *software*, after all, and
always has plenty of options to deal with such problems
creatively, as long as all the component pieces needed for
character representation are present in Unicode.

--Ken

>
> Thanks,
>
> Leonid Broukhis
>
>

Next message: Asmus Freytag: "Re: Latin J capital letter with caron"
Previous message: Kenneth Whistler: "Re: Latin J capital letter with caron"
Maybe in reply to: Leo Broukhis: "Character proposal: SUBSCRIPT TEN"
Next in thread: Leo Broukhis: "Re: Character proposal: SUBSCRIPT TEN"
Reply: Leo Broukhis: "Re: Character proposal: SUBSCRIPT TEN"
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ] [ attachment ]
Mail actions: [ respond to this message ] [ mail a new topic ]

This archive was generated by hypermail 2.1.5 : Wed Jan 16 2008 - 15:44:48 CST