RE: PDUTR #26 posted

From: Carl W. Brown (cbrown@xnetinc.com)
Date: Mon Sep 17 2001 - 14:08:25 EDT


Doug,

> But if people start compromising their UTF-8 parsers to
> accommodate CESU-8
> "adaptively," it would be a great blow to UTF-8. It would
> essentially undo
> all the tightening-up that was accomplished by the Corrigendum,
> and it would
> revive all the old Bruce Schneier-style skepticism about the
> "security" of
> Unicode.

You are right. Elimination the non-shortest for where by specifying a space
for example as \xC0\xA0 instead of \x20 it insure that text screening
programs only have one form of space to check for. CESU-8 reopens this
security hole.

Processing MBCS code like UTF-8 is dependent on checking each character for
byte length. This is done through out the code. A simple length retrieval
such as LenChar = bytesFromUTF8[*pointer]; is very fast the extra code to
check for surrogates can double the overhead in processing typical text.
This might be something that we can live with. The problem is rather one of
making sure that every place in a application has been changed. If not you
have subtle bugs introduced into your code.

It would seem to be that if you either have to change the UTF-8 code to
support CESU-8 or change the UTF-16 compare logic then changing the UTF-16
logic to do code point order compares is a much more containable change with
a much lower processing impact.

Carl



This archive was generated by hypermail 2.1.2 : Mon Sep 17 2001 - 13:17:52 EDT