Re: Name Compression. Comparison and Tweaks

From: Torsten Mohrin (mohrin@sharmahd.com)
Date: Sat May 13 2000 - 11:57:37 EDT

Next message: RWhizz12@aol.com: "Re: dozenal and hexadecimal digits"
Previous message: Torsten Mohrin: "Re: Compression and Unicode [was: Name Compression]"
Maybe in reply to: Kenneth Whistler: "Name Compression. Comparison and Tweaks"
Next in thread: Juliusz Chroboczek: "Re: Name Compression. Comparison and Tweaks"
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ] [ attachment ]
Mail actions: [ respond to this message ] [ mail a new topic ]

Kenneth Whistler wrote:

>Approach 2: Torsten Mohrin
>[...]
>Claimed compression: 262,000 bytes ==> 59,000 bytes for the recoded
>names + 29,000 bytes for the code->word lookup table. Total size,
>around 88,000 bytes.
>[...]
>My analysis of the Unicode 3.0.0 names list shows the names alone
>as comprising 278,346 bytes (counting a one-byte delimiter between
>each) for 10,538 names (omitting the control codes and the ranges
>of characters with algorithmically derivable names).

I just want to clarify the reasons for this difference:

1. The names of C0 and C1 controls (from Unicode 1.0) are also
encoded.

2. Names of CJK COMPATIBILITY IDEOGRAPHs (F900 -> FA2D) are derived
algorithmically, saving 9966 bytes.

3. Names of BRAILLE PATTERNs are also derived algorithmically, saving
6656 bytes. The compiled code for this algorithm takes less than 1 KB.
5.5 KB total saving is not much, but I couldn't resist :)

--Torsten

Next message: RWhizz12@aol.com: "Re: dozenal and hexadecimal digits"
Previous message: Torsten Mohrin: "Re: Compression and Unicode [was: Name Compression]"
Maybe in reply to: Kenneth Whistler: "Name Compression. Comparison and Tweaks"
Next in thread: Juliusz Chroboczek: "Re: Name Compression. Comparison and Tweaks"
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ] [ attachment ]
Mail actions: [ respond to this message ] [ mail a new topic ]

This archive was generated by hypermail 2.1.2 : Tue Jul 10 2001 - 17:21:02 EDT