Re: U+00BA and U+00AA (was: "Re: Public Review Issue Unicode Technical Report #25, "Unicode Support for Mathematics"")

From: Kenneth Whistler (kenw@sybase.com)
Date: Thu Jan 25 2007 - 16:50:17 CST

  • Next message: Richard Wordingham: "Re: ZWJ, ZWNJ and VS in Latin and other Greek-derived scripts"

    Jukka said:

    > The identity of U+00BA and U+00AA is somewhat vague. Their names suggest
    > very specific usage. If they are meant for more general use, I think notes
    > about this should be added, perhaps to the code chart, perhaps to the list
    > of misleading character names. If, on the other hand, their intended usage
    > is limited as suggested by their names, I think this should be mentioned
    > too, in the standard.

    Don't lose sight of the fact that U+0020..U+007F, U+00A0..U+00FF
    can be considered ISO/IEC 8859-1 compatibility characters.

    The identity of U+00BA and U+00AA is *exactly* what they were/are
    in Latin-1, because that is where they came from. That is also
    where the *names* came from.

    ISO/IEC 8859-1 says precisely zilch about U+00BA and U+00AA (and
    never has said anything about them), other than what you could
    imply by the stated intent for language coverage, to include
    Spanish and Portuguese.

    So in practice, U+00BA and U+00AA are whatever 0xBA and 0xAA
    in ISO/IEC 8859-1 and the corresponding code points in
    Windows 1252 (the two most widely implemented 8-bit character
    encodings) have been used for for 20-some years now. (And if
    you want, you can dig into the history of character encoding
    to find the pre-existing character encodings that 8859-1 itself
    got them from.)

    So unless folks are all hung up and confused about what 0xBA and
    0xAA are used for in the continuing widespread usage of
    8859-1 and Windows 1252, I don't see any great need for
    annotating them further in the Unicode Standard.

    I suppose the issue is that there are so many *other* things
    to contrast them with in the Unicode Standard. So to get
    everything on the table, folks need to be considering not
    only:

    00AA;FEMININE ORDINAL INDICATOR;Ll;0;L;<super> 0061;;;;N;;;;;
    00BA;MASCULINE ORDINAL INDICATOR;Ll;0;L;<super> 006F;;;;N;;;;;

    but also:

    1D43;MODIFIER LETTER SMALL A;Lm;0;L;<super> 0061;;;;N;;;;;
    1D52;MODIFIER LETTER SMALL O;Lm;0;L;<super> 006F;;;;N;;;;;

    If you are writing traditional Spanish and Portuguese
    abbreviations, you'd use the same characters that you'd write
    the same text with in 8859-1, namely U+00AA and U+00BA.

    If you are writing UPA transcriptions, with superscript
    modifier letters, and you don't want to be at the mercy of
    font design for what style of "a" is used or whether the
    "a" or "o" is displayed with underscores, etc., then you
    would use U+1D43 and U+1D52.

    --Ken



    This archive was generated by hypermail 2.1.5 : Thu Jan 25 2007 - 16:51:18 CST