RE: Least used parts of BMP.

From: John Dlugosz (JDlugosz@tradestation.com)
Date: Fri Jun 04 2010 - 12:03:06 CDT

  • Next message: Michael Everson: "Re: Emoji (was: Re: Preparing a proposal for encoding a portable interpretable object code into Unicode)"

    > -----Original Message-----
    > From: unicode-bounce@unicode.org [mailto:unicode-bounce@unicode.org] On
    > Behalf Of Doug Ewell
    >
    > That said, if Kannan were to go with the alternative format suggested
    > on
    > this list:
    >
    > 0xxxxxxx
    > 1xxxxxxx 0yyyyyyy
    > 1xxxxxxx 1yyyyyyy 0zzzzzzz
    >
    > then he would at least have this one feature of UTF-8, at no additional
    > cost in bits compared to the format he is using today.

    Yes, the cost in bits is the same. Fast gulping of stream input is one difference, but having the encoded form contain, literally, the unencoded form makes it particularly simple. Just read the bytes into a variable and mask off the high bits. With the distributed bit, it has to be masked and shifted and OR'ed in 7 bits at a time. For the general form using very large words (e.g. cryptographic keys), the encoded form is just a prefix followed by the _same_ bytes that you want stored in the memory block, whose length you know.

    >
    > Of course, he will not have other UTF-8-like features, such as
    > avoidance
    > of ASCII values in the final trail byte, and "fast forward parsing" by
    > looking at the first byte. He may not care. One thing I've noted
    > about
    > descriptions of UTF-8, in the context of alternative formats for
    > private
    > protocols, is that they always assume these features are important to
    > everyone, when they may not be.

    For sure. A compressed stream will not be random-access and will include other mechanisms such as checksums. A storage format can be very different from the manipulation format.

    TradeStation Group, Inc. is a publicly-traded holding company (NASDAQ GS: TRAD) of three operating subsidiaries, TradeStation Securities, Inc. (Member NYSE, FINRA, SIPC and NFA), TradeStation Technologies, Inc., a trading software and subscription company, and TradeStation Europe Limited, a United Kingdom, FSA-authorized introducing brokerage firm. None of these companies provides trading or investment advice, recommendations or endorsements of any kind. The information transmitted is intended only for the person or entity to which it is addressed and may contain confidential and/or privileged material. Any review, retransmission, dissemination or other use of, or taking of any action in reliance upon, this information by persons or entities other than the intended recipient is prohibited. If you received this in error, please contact the sender and delete the material from any computer.



    This archive was generated by hypermail 2.1.5 : Fri Jun 04 2010 - 12:04:19 CDT