Re: Origin of the U+nnnn notation

From: Kenneth Whistler (kenw@sybase.com)
Date: Wed Nov 09 2005 - 13:18:24 CST

  • Next message: Cristian Secarã: "three questions about alphabet files at Michael Everson site"

    Well, since this wildly OT thread has now plunged off the
    embankment...

    > ObUnicode: At http://www.unicode.org/Public/UNIDATA/UCD.html
    > we read the description of field 8 in UnicodeData.txt:
    >
    > (8) If the character has the numeric property, as specified in
    > Chapter 4
    > of the Unicode Standard, the value of that character is
    > represented with
    > an positive or negative integer or rational number in this field.
    >
    > [Yes, "an" positive, sic.]

    Gotta love the nitpickers! Eventually I'm sure we'll manage to
    remove every defect from the documentation of the standard... as
    long as we don't keep adding more text. :-)

    > The author of this text seems to assume that
    > "positive or negative" includes zero, inasmuch as U+0030, for example,
    > has the value 0 in field 8.

    Actually, the author of that text has a Ph.D. in logic, and I'm
    reasonably sure that he would disagree with your assumption
    that he would assume that "positive or negative" includes zero.

    If you wanted to bandy logic with him, he would probably counter
    that rational numbers *do* include zero, so that the inclusion
    of zero values in field 8 does not render the description
    invalid.

    But if you want to engage in textual archaeology, the correct thing
    to do is to first research how such things come to be. Compare
    the current version of UCD.html with some of its textual
    antecedents, such as:

    http://www.unicode.org/Public/3.0-Update/UnicodeData-3.0.0.html

    You'll see there that the corresponding explanation then read:

    "... the value of that character is represented with an integer
    or rational number in this field."

    At some point, the authors (that would be Mark Davis and myself)
    stuck a "positive or negative" into that text to cover the
    fact that some characters had started showing up with *negative*
    values, so that people didn't assume that "integer" meant
    nonnegative integer -- which actually would have been a correct
    assumption for earlier versions of Unicode. Clearly not enough
    attention was paid to the actual editing of the text when the
    "positive or negative" was stuck in, and nobody has caught the
    result until now.

    > (I would have written "with a (possibly
    > signed)
    > integer or rational number" and then given the three examples of
    > U+0035 DIGIT FIVE having value 5, U+1946 LIMBU DIGIT ZERO having
    > value 0, and U+0F33 TIBETAN DIGIT HALF ZERO having value -1/2.)

    I'll pass your suggestion along to the editors, as I agree that
    the description could be improved. ;-)

    --Ken

    >
    > --Guy Steele
    >
    >
    >



    This archive was generated by hypermail 2.1.5 : Wed Nov 09 2005 - 13:19:42 CST