From: D. Starner (shalesller@writeme.com)
Date: Wed Nov 26 2003 - 10:05:03 EST
> I see no reason why you accept some limitations for this
> encapsulation, but not ALL the limitations.
Because I can convert the data from binary to Unicode text in UTF-16
in a few lines of code if I don't worry about normalization. Suddenly
the rules become much more complex if I have to worry about normalization.
The simple fact is I can change UTF-8 to UTF-16 to UTF-32 with several
utilities on my system, but not the normalization. I don't know of any
basic text tools that handle normalization, so if I edit a source code
and email it to someone (which compresses and decompresses automatically),
they're going to have trouble running diff on the code.
> If you don't want that such "denormalisation" occurs during the compression,
> don't claim that your 9-bit encapsulator produces Unicode text (so don't
> label it with a UTF-* encoding scheme or even a BOCU-* or SCSU character
> encoding scheme, but use your own charset label)!
The whole point of such a tool would be to send binary data on a transport that
only allowed Unicode text. In practice, you'd also have to remap C0 and C1
characters; but even then 0x00-0x1F -> U+0250-026F and 0x80-0x9F to U+0270-U+028F
wouldn't be too complex. Unless you've added a Unicode library to what could
otherwise be coded in 4k, normalization would add a lot of complexity.
-- ___________________________________________________________ Sign-up for Ads Free at Mail.com http://promo.mail.com/adsfreejump.htm
This archive was generated by hypermail 2.1.5 : Wed Nov 26 2003 - 10:57:30 EST