Re: Character proposal: SUBSCRIPT TEN

From: Kenneth Whistler (kenw@sybase.com)
Date: Wed Jan 16 2008 - 15:42:43 CST

  • Next message: Asmus Freytag: "Re: Latin J capital letter with caron"

    > The SUBSCRIPT TEN (== decimal exponent base) character remains the
    > only character that is present in the GOST-10859 standard
    > (http://en.wikipedia.org/wiki/GOST_10859) and in the character set of
    > the Soviet ACPU-128 drum printer but is absent is Unicode.
    > Also see the the German character set ALCOR
    > (http://en.wikipedia.org/wiki/ALCOR).

    GOST 10859 and ALCOR were effectively dead encodings long before
    Unicode even got started collecting repertoire, and were not
    considered among the important initial set of character encodings
    from which compatibility characters were culled for one-to-one
    mapping in Unicode. If they had been, no doubt a subscript
    numeral 10 character would have been included in Unicode 1.0,
    along with all the square Japanese abbreviations, for example.

    > It cannot be replaced by SUBSCRIPT ONE + SUBSCRIPT ZERO, because it
    > has to occupy one character position for the sake of text aligned for
    > a fixed-width font.

    That's debatable. For transcoding obscure character encodings,
    there really is no requirement that you have one-to-one
    mappings for every character. You can certainly represent
    the subscript 10 in GOST 10859 with <2081, 2080> in Unicode
    and convert it back losslessly with no problem.

    > What should an emulator of a computer that used GOST 10859 or ALCOR
    > produce, then?

    For an emulator you would have various options, including
    mapping of the sequence <2081, 2080> to your fixed-width
    ACPU-128 drum printer font glyph for a subscript 10. Or,
    if your emulator is making one-to-one character to glyph
    assumptions, then you use a PUA value to stand in for the
    sequence, and map *that* to your fixed-width glyph.

    > Is there a chance for this character (or a way to request halfwidth
    > subscript/superscript characters) to appear in Unicode?

    Of course. See:

    http://www.unicode.org/pending/proposals.html

    Unlike what Philippe suggested, there isn't any requirement
    that such a proposal come in from the German National Body
    for ALCOR or the Russian National Body on behalf of an old
    Soviet standard. If it *did* come from a national body, the
    proposal would no doubt carry more weight in WG2 for
    consideration for ISO/IEC 10646, but in any case such
    sponsorship is not required to simply add one more symbol
    for compatibility with an old encoding to the standard.

    However, justification in terms of emulation of long unused
    character sets and computing machinery isn't a very strong
    case, since emulation software is *software*, after all, and
    always has plenty of options to deal with such problems
    creatively, as long as all the component pieces needed for
    character representation are present in Unicode.

    --Ken

    >
    > Thanks,
    >
    > Leonid Broukhis
    >
    >



    This archive was generated by hypermail 2.1.5 : Wed Jan 16 2008 - 15:44:48 CST