Re: UNICODE-non-clashing ASCII character needed

From: John Cowan (cowan@mercury.ccil.org)
Date: Mon Apr 07 2003 - 07:20:47 EDT

  • Next message: Peter_Constable@sil.org: "Re: ogonek vs. retroflex hook"

    Abdij Bhat scripsit:

    > For most of the communication we translate the UNICODE bytes into a BYTE
    > stream and push it into the Camera for storage (this is the way it has to
    > be). This works for most of the features.

    Since you are dealing with a byte-oriented device, you should be using the
    UTF-8 encoding, which guarantees that bytes in the range 00-7F are used
    only to represent ASCII characters. Unfortunately, the tradeoff is that
    many Unicode characters (basically those from U+0800 up) will require
    three bytes rather than just two. I don't know how seriously this will
    impact your markets.

    > What we want to know
    > is, IS there a ASCII BYTE (SINGLE BYTE) that does not correspond to any of
    > the UNICODE BYTE set (all languages included) (FIRST or SECOND). Is there
    > some kind of a reservation on BYTE, say for example, FF which will not be
    > used by the UNICODE BYTE character set?

    No, there definitely isn't. In the 2-byte representation, every possible
    byte is used at some point or another.

    -- 
    John Cowan           http://www.ccil.org/~cowan              cowan@ccil.org
    To say that Bilbo's breath was taken away is no description at all.  There
    are no words left to express his staggerment, since Men changed the language
    that they learned of elves in the days when all the world was wonderful.
            --_The Hobbit_
    


    This archive was generated by hypermail 2.1.5 : Mon Apr 07 2003 - 08:10:51 EDT