Clones (was RE: Hexadecimal)

From: Jill.Ramonsky@Aculab.com
Date: Mon Aug 18 2003 - 06:17:37 EDT

  • Next message: Peter Kirk: "Re: [Way OT] Beer measurements"

    All of this makes sense to me, apart from one or two tiny niggling points...

    I confess, I hadn't read ch14.pdf, and I probably should have done. My
    fault. But I still believe that there should be something in the
    machine-readable code charts themselves that says, of the Roman numerals,
    "Don't use these characters - use the the normal Latin letters instead". If
    they really are there _SOLELY_ for round trip compliance with East Asian
    standards, then, if I wish to put the year MMIII in a web page, I should
    _NOT_ use the Roman letters. Furthermore, if I write software to interpret
    Roman Numbers, I only need to interpret the Basic Latin letters, not the
    Roman ones. My life as a webmaster and programmer is made so much SIMPLER by
    not having to use the Roman letters. I would really like it if these, and
    every single other character which is "only there for reasons of round trip
    compatibility" with something else, were explicity marked in the
    machine-readable charts with something meaning "Don't introduce this
    character, at all, ever. Don't try to interpret it. Just preserve it, in
    case it ever gets turned back to its original character set".

    Secondly, I believe that the code charts SHOULD provide machine-readable
    information about the hexadecimal values of the letters "A" to "F".
    Codepoint FF21, for example, has the property "Hex_Digit". Now, I _could_
    parse the textual description in the rest of the line ("FULLWIDTH LATIN
    CAPITAL LETTER A"), deduce that this can be replaced by "A", and then use
    the ASCII algorithm to convert this to ten ... but it would be SO MUCH NICER
    if _every_ character (or range of characters) which had the "Hex_Digit"
    property ALSO had a simple, straightforward, lookup table, which immediately
    told me that, when interpretted as hex, this symbol means ten.

    Thirdly, as Jim pointed out, specialist disciplines should not expect
    characters to be cloned all over the place just because they have a
    different meaning in their particular discipline. I do agree with this, but
    what confuses me is what APPEAR to be the large number of violations of this
    rule already present in Unicode. For example:
            U+2212 (minus sign) - an obvious clone of U+002D (hyphen-minus). Who
    uses this?
            U+2217 (asterisk operator) - an equally obvious clone of U+002A
    (asterisk)
            U+223C (tilde operator) - a clone of U+007E (tilde)
    and then there's:
            U+2223 (divides) - hell, that looks to me remarkably like U+007C
    (vertical line)

    Conversely, there are also things that look different, but mean the same.
    For example:
            U+2264 (less than or equal to) - compare with U+2A7D (less than or
    slanted equal to)

    The last example is interesting (to me) because the difference between the
    two seems like a font difference - like the difference between "g" with a
    tail and "g" with a loop. In defence of this argument, I point out that the
    complementary relation, NOT equal to, has codepoint U+2270, and this is
    represented in the code charts as having a slanted equal to, so it OUGHT to
    be the complement of U+2A7D. (Unless I've missed it, there appears to be no
    "not equal to with horizontal equals" character).

    So, yes, I agree with Jim. Let's not have too many duplicates. But I still
    have to ask why there are so many already?

    -----Original Message (1)-----
    From: Doug Ewell [mailto:dewell@adelphia.net]
    Sent: Saturday, August 16, 2003 9:14 PM
    To: Unicode mailing list
    Cc: Pim Blokland
    Subject: Re: Hexadecimal

    Not exactly. The character U+216E ROMAN NUMERAL FIVE HUNDRED came from
    an East Asian double-byte character set, and was carried over into
    Unicode for round-tripping reasons. It is a compatibility equivalent of
    U+0044.

    AND...
    -----Original Message (2)-----
    From: Jim Allan [mailto:jallan@smrtytrek.com]
    Sent: Saturday, August 16, 2003 9:13 PM
    To: unicode@unicode.org
    Subject: Re: Hexadecimal

    .... from an explanation as to why Unicode
    coded Roman numerals separately. See 14.3 at
    http://www.unicode.org/versions/Unicode4.0.0/ch14.pdf:

    << Number form characters are encoded solely for compatibility with
    existing standards. >>

    Also

    << Roman Numerals. The Roman numerals can be composed of sequences of
    the appropriate Latin letters. Upper- and lowercase variants of the
    Roman numerals through 12, plus L, C, D, and M, have been encoded for
    compatibility with East Asian standards. >>

    AND FINALLY...
    -----Original Message (3)-----
    From: Jim Allan [mailto:jallan@smrtytrek.com]
    Sent: Saturday, August 16, 2003 9:13 PM
    To: unicode@unicode.org
    Subject: Re: Hexadecimal

    Anyone at any time in any descipline can assign a special meaning to a
    Latin letter without waiting for this meaning to be encoded in Unicode
    and should not expect that a clone of the character with that special
    meaning would ever be encoded in Unicode.



    This archive was generated by hypermail 2.1.5 : Mon Aug 18 2003 - 07:01:27 EDT