From: Doug Ewell (doug@ewellic.org)
Date: Mon Feb 21 2011 - 15:43:52 CST
Philippe Verdy <verdy underscore p at wanadoo dot fr> wrote:
> And anyway it is also much simpler to understand and easier to
> implement correctly (not like the sample code given here) than SCSU,
I don't buy this. A simple SCSU encoder, which achieves most of the
benefits of a complex one, is nearly as simple as Cropley's algorithm.
Both the complexity of SCSU, and the importance of the complexity of
SCSU, continue to be highly overrated.
Part of the apparent simplicity of Cropley's algorithm, as viewed from
his "Preliminary Proposal" HTML page, is that it omits a proper
description of the code-page switching mechanism, as well as the "magic
number" definitions of the code pages and the control bytes needed to
introduce them. These are present in the sample code, but to see them,
you have to paw through the UTF-8 conversion code and UI.
> and it is still very highly compressible with standard compression
> algorithms while still allowing very fast processing in memory in its
> decompressed encoded form :
I see no metrics or sample data to back this up. How does Cropley's
algorithm perform with mixed scripts (say Greek and Cyrillic), with
embedded punctuation in the U+2000 block, with Deseret and other
alphabets omitted from the Alphabet table, with larger alphabets where
multiple 64-blocks are needed, with Han and Hangul?
-- Doug Ewell | Thornton, Colorado, USA | http://www.ewellic.org RFC 5645, 4645, UTN #14 | ietf-languages @ is dot gd slash 2kf0s
This archive was generated by hypermail 2.1.5 : Mon Feb 21 2011 - 15:48:03 CST