Re: Hexadecimal digits

From: Robert Abel (freakrob@googlemail.com)
Date: Tue Jun 08 2010 - 15:20:47 CDT

  • Next message: Luke-Jr: "Re: Hexadecimal digits"

    On 2010/06/08 21:43, John Dlugosz wrote:
    > So, our unique digits are grandfathered in. It was in ASCII and in EBCDIC, so it's in Unicode. Sometime later, assemblers and compilers came along. The writers of these tools had little trouble using context or strict rules to distinguish A-F between their role as digits or numbers. We could do without separate U+0030 and U+0031 today just as well: O is a reserved word in C, identifiers that look like numbers (beginning with O or l and containing only characters that are used to form numeric literals) would be deemed to be parsed as numbers, or use a special mark, or whatever.
    >
    > So it's only history that some glyphs used as digits are separate and others (for Computer Science work anyway) are not. In practice, we don't need unique assignments, in general. There are characters that are used in numeric literals and they are a subset of those used for words in general
     From a parsing point of view this might not matter, however, for
    distinguishing characters by glyph this matters a lot:

    0O
    1l

    Were these the same code points it would be pretty hard to read, because
    we know from handwriting that these characters do look different.
    Usually fixed-width fonts that programmers tend to use will make these
    glyphs distinguishable, because they have to be.

    Even worse, some cursive/"handwriting" fonts style digits and the
    respective confusable letters differently. You couldn't do this if they
    were encoded as the same character without having to have contextual
    glyphs ready and some text engine that supports it (and even then you
    couldn't type a zero in a word, because it would be a capital O).
    Usually the styles for characters will differ from digits because digits
    are not written as a whole string without breaks, whereas characters
    usually are. Sure, you could still have the same glyph and make it look
    good, but it wouldn't look natural, nor would it be practical.

    So I don't think that we _could do without_ those characters having
    different code points today. Even back then it must have seemed like a
    hack to type a lowercase L instead of a 1.
    I think this a neat example of why Unicode encodes the character's
    abstract identity rather than it's shape. That's why we have Han
    unification after all, because some characters have the same abstract
    identity which was preserved, while others, such as our digits do not
    share identities with Latin characters.

    Robert



    This archive was generated by hypermail 2.1.5 : Tue Jun 08 2010 - 15:24:57 CDT