Re: Surrogate points

From: Hans Aberg (haberg@math.su.se)
Date: Mon Jan 31 2005 - 12:24:13 CST

  • Next message: Hans Aberg: "Re: Surrogate points"

    At 14:18 -0800 2005/01/30, Doug Ewell wrote:
    >In any case, it is incorrect to state that the choice of this block was
    >due to "failure to given UTF-16 a proper design." Other blocks, such as
    >the "obvious" 0xF800 through 0xFFFF, were already occupied.

    It makes the character number allocations dependent of a particular
    encoding, which is wholly unnecessary.

    >2. Noncharacters 0xFFFE and 0xFFFF
    >
    >The designation of 0xFFFE and 0xFFFF as "noncharacters" goes back to
    >Unicode 1.0 (1991), although that term was not used at the time. The
    >numeric value -1 has a long history of being used as a "sentinel" value,
    >to indicate the end of a series of real values. This works fine for
    >non-negative numeric data, such as inventory counts, but caused problems
    >in existing 8-bit character sets where the value 0xFF might have a real
    >meaning.
    >
    >To solve this problem, Unicode 1.0 set aside the value 0xFFFF as NOT
    >corresponding to an actual character. This way, programs that used
    >16-bit values (i.e. all Unicode programs at the time) could safely use
    >it as a sentinel without fear of colliding with a real character
    >assignment. This was completely intentional.

    Again, one sets these values aside in the encoding, if necessary, not in the
    character model.

    >Claiming that either of these features of Unicode is the result of poor
    >design of UTF-16 is simply wrong. It is an uninformed opinion based on
    >inadequate consideration of the facts.

    So obviously, the guys who did this design, did not understand to clearly
    separate the character model from the encoding.

    >Hans, I don't know how long you spent on this list as a silent observer
    >("lurker") before you began posting, but evidently not long enough.
    >
    >When I joined this list, I spent almost a year lurking before I made my
    >first post. I listened to the experts. I made plenty of wrong
    >statements of my own, but accepted the criticisms and corrections of
    >those who obviously knew more than I did. I learned the history of why
    >things are, and perhaps most importantly, I learned the importance of
    >Unicode's stability policies, which explain why it is TOO LATE to make
    >major architectural changes that would invalidate all existing
    >implementations.
    >
    >While I admit a year may be excessive, I strongly suggest you take some
    >time off to READ the list, read the FAQ's, read the book (on-line or
    >hardcover), read the UAX's and UTS's and UTR's, and THINK about why the
    >Unicode Standard is the way it is, and what can -- and cannot -- be done
    >to change it. The choice is entirely up to you, but if you do not do
    >the necessary homework to draw reasonable conclusions and ask reasonable
    >questions, your posts will continue to reflect your lack of
    >understanding, and will be ignored by more and more people.

    I was clearly, more or less, aware of the facts you at some length put up,
    before I was posting. The idea was that the intelligent reader should
    notice, before replying.

    So, evidently, your one year of lurking didn't help you.

      Hans Aberg



    This archive was generated by hypermail 2.1.5 : Mon Jan 31 2005 - 12:25:57 CST