Re: marks (2 new symbols)

From: Mark E. Shoulson (mark@kli.org)
Date: Tue Oct 02 2007 - 06:55:23 CST

  • Next message: Mark E. Shoulson: "Re: New brackets (6 new symbols)"

    Has anyone mentioned the information-theoretic implications of this
    scheme? By using a whole *character* to mark the following one as
    capitalized, it essentially devotes one whole character, say eight or
    sixteen bits or whatever it takes, to transmit *ONE* bit of information
    (is it capital or not?). The other variants, "abbreviation" or whatever,
    don't change this much; it's still a whole character to handle two or
    three bits of information.

    What's more, the burden of the extra information falls on the text
    itself, not on the encoding tables. We would hope that the text and
    information encoded in Unicode should be hundreds of times more than the
    size of the tables, even replicated in all the software that uses it.
    After all, the point of Unicode is to encode the information, not the
    other way around. So the extra bits are repeated in every document that
    uses capital letters, rather than just living in the much smaller
    encoding tables and software. Admittedly this would be an advantage, I
    guess, for documents in languages that don't use capitals, but (a) so
    what, and (b) let's face it, Latin is far and away the most used writing
    system in computer storage.

    As regards Unicode II, it should be noted that there are lots of things
    that really are wrong with Unicode, mistakes that shouldn't have been
    made but that can't be changed (as opposed to capitalization, which
    isn't a mistake). There's the famous case of FHTORA, which is known to
    be misspelled, and cannot be changed. Or the annoyance of Hebrew vowel
    combining classes setup wrong. And nobody is seriously proposing a
    Next-Generation Unicode in which the so-called "Cleanicode" (Unicode
    where everything is done *right*) is implemented from scratch. Such a
    radical change would not be worth the pain of implementing it.

    ~mark



    This archive was generated by hypermail 2.1.5 : Tue Oct 02 2007 - 06:59:13 CST