From: Doug Ewell (dewell@adelphia.net)
Date: Sun Dec 05 2004 - 02:16:53 CST
Philippe Verdy <verdy underscore p at wanadoo dot fr> wrote:
>> I appreciate Philippe's support of SCSU, but I don't think *even I*
>> would recommend it as an internal storage format. The effort to
>> encode and decode it, while by no means Herculean as often perceived,
>> is not trivial once you step outside Latin-1.
>
> I said: "for immutable strings", which means that these Strings are
> instanciated for long term, and multiple reuses. In that sense, what
> is really significant is its decoding, not the effort to encode it
> (which is minimal for ISO-8859-1 encoded source texts, or Unicode
> UTF-encoded texts that only use characters from the first page).
>
> Decoding SCSU is very straightforward, even if this is stateful (at
> the internal character level). But for immutable strings, there's no
> need to handle various initial states, and the states associated with
> each conponent character of the string has no importance (strings
> being immutable, only the decoding of the string as a whole makes
> sense).
Here is a string, expressed as a sequence of bytes in SCSU:
05 1C 4D 6F 73 63 6F 77 05 1D 20 69 73 20 12 9C BE C1 BA B2 B0 2E
See how long it takes you to decode this to Unicode code points. (Do
not refer to UTN #14; that would be cheating. :-)
It may not be rocket science, but it is not trivial.
-Doug Ewell
Fullerton, California
http://users.adelphia.net/~dewell/
This archive was generated by hypermail 2.1.5 : Sun Dec 05 2004 - 02:19:07 CST