Improper grounds for rejection of proposal N2677

From: Andrew S (asunic@mail.ru)
Date: Tue Oct 25 2005 - 04:35:25 CST

  • Next message: Michael Everson: "Re: Results posted: Informal poll"

    At http://www.unicode.org/reports/tr27/ is
    "Mathematical notation requires a number of Latin and Greek alphabets that initially appear to be mere font variations of one another. For example, the letter H can appear as plain, or upright (H), bold (H), italic (H) and script. However, in any given document, these characters have distinct, and usually unrelated mathematical semantics. For example, a normal H represents a different variable from a bold H, etc. If these attributes are dropped in plain text, the distinctions are lost and the meaning of the text is altered... By encoding a separate set of alphabets, it is possible to preserve such distinctions in plain text."

    Note that this is "disruptive" in the sense that it absolutely breaks with current standard usage for most, if not all, math systems, in which markup of the underlying character set (e.g. ASCII) which lacks separately encoded alphabets is used for such distinctions. Existing math systems can simply ignore the new characters and continue to use markup, or they can be modified to use the new characters instead of markup, but such modification is definitely disruptive.

    At http://std.dkuug.dk/jtc1/sc2/WG2/docs/n2677.pdf is a proposal for using U+218A through U+218F as hex digits "A" through "F", assigning them numeric values ten through fifteen, which would allow new systems, and modified existing systems, to use these new characters which are unambiguously defined to be digits with specific numeric values instead of using Latin letters "A" through "F" as digits which requires markup (e.g. "0x" prefixes to hexadecimal numbers) to distinguish between letters and numbers. Systems which use both decimal and hexadecimal number representations would still need markup (unless new hexadecimal digits 0 through 9 were disunified from the decimal digits, though this is not recommended, because it would require in general a complete set of new characters for every radix) to distinguish them, but that's a separate issue from distinguishing numbers from letters. (The author of the proposal cites other advantages as well, such as using the letter-vs-digit semantic distinction to enable d
    istinct display of hexadecimal numbers; such display advantages apply to the aforementioned new math alphabets as well.) Such use is of course optional; existing systems have the option of simply ignoring the new hex characters, just like existing math systems have the option of ignoring the new math characters, so the new hex characters are only disruptive for systems which choose to use them.
    Yet the Unicode group says at http://www.dkuug.dk/jtc1/sc2/wg2/docs/n2754.pdf
    "WG2 rejects the proposal for six hexadecimal digits in document N2677 for the reason that the proposed disunification from Latin Letters A to F is disruptive to all existing implementations which use the current encoding of these letters to represent Hexadecimal Digits."
    But it's not disruptive to any existing implementations at all; WG2's claim otherwise is patently false. All existing implementations have the option of simply ignoring the new characters. WG2's justification for rejection of N2677 is itself irrational, and furthermore is inconsistent with the approval of new math alphabets.

    The possible justification for rejecting N2677 that "Unicode exists only to encode the characters of existing scripts" also would be inconsistent with WG2's approval of the math alphabets, since if an italic "H" is considered to be of a different script than a plain "H" in the same math formula then certainly the digit "F" in a hexadecimal number is of a different script than the letter "F" in an English word.

    The possible justification for rejecting N2677 that "Unicode does not specify interpretations of characters" would be invalid, both because Unicode already does specify numeric values (note that numeric values are interpretations) of some characters, and more importantly because N2677's distinction between hexadecimal digits and Latin letters is not an interpretation of characters but a classification of homographic characters into distinct scripts, a distinction which is in accordance with Unicode's stated design of encoding characters by script rather than by homographic isomorphism.

    The possible justification for rejecting N2677 that the proposed characters are not already in wide use would be inconsistent with the fact that the characters are actually already in wide use; the fact that the characters presently happen to be normally encoded by markup of Latin letters rather than by codes in the character set is irrelevant.

    If the rejection is to stand, WG2 should provide a rational justification.



    This archive was generated by hypermail 2.1.5 : Tue Oct 25 2005 - 04:36:09 CST