Re: FW: Subj: Amount of Space Unicode Takes

From: Asmus Freytag (asmusf@ix.netcom.com)
Date: Mon Jul 16 2007 - 13:56:44 CDT

  • Next message: Addison Phillips: "Re: FW: Subj: Amount of Space Unicode Takes"

    On 7/16/2007 10:24 AM, Magda Danish (Unicode) wrote:
    > Daniel,
    > I am forwarding your question to the Unicode mailing list http://www.unicode.org/consortium/distlist.html for possible help from list subscribers.
    > Regards,
    >
    > ---------------------------
    > Magda Danish
    > Sr. Administrative Director
    > The Unicode Consortium
    > 650-693-3921
    > magda@unicode.org
    >
    >
    >
    > -----Original Message-----
    > Date/Time: Fri Jul 13 12:58:18 CDT 2007
    > Contact: dbjohnson88@hotmail.com
    > Name: Daniel Johnson
    > Report Type: Other Question, Problem, or Feedback Opt Subject: Amount of Space Unicode Takes
    >
    > I have a question about how much space Unicode takes up. I am working on a HTML project in multiple languages. Each of these web pages have to be stored on a chip with limited space. Is there any way to "compact" the HTML scripts in order to save space on the chip? Or is there a different call number for a character which will take up less space in hex? It would be greatly appreciated if the email was answered.
    >
    If you're project allows you to insert your own decompression layer
    between the on-chip storage and the HTML, then you can use SCSU, the
    Standard Compression Scheme for Unicode. SCSU is intended for
    applications where the goal of compression is to arrive at about the
    same size as a traditional 8-bit encoding for the *same* text. It also
    preserves ASCII, so it only compresses the text data in your HTML, not
    the syntax characters or element names, etc. If needed, you can delay
    the decoding until the time you actually need to display a given bit of
    text, since you can parse the HTML syntax w/o decoding.

    See http://www.unicode.org/reports/tr6/ for the full details.

    Decoders are very small and easy to write yourself, encoders can be more
    complex, as there are sometimes multiple choices on how to compress a
    string, with different lengths. If you try for the absolute 'best' case,
    your encoder can get tricky, but, as experience has shown, a
    'reasonable' effort will deliver good results at very moderate code
    complexity.

    Sample code exists or can be found easily for a number of approaches,
    and you don't need to use an external library, making it ideal for
    on-chip solutions.

    A./
    > Thank you
    >
    > Daniel Johnson
    >
    > -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- (End of Report)
    >
    >
    >
    >
    >
    >



    This archive was generated by hypermail 2.1.5 : Mon Jul 16 2007 - 13:57:56 CDT