"G. Adam Stanislav" <adam@whizkidtech.net> wrote:
>> 1. The Unicode Technical Committee has modified the definition of
>> UTF-8 to forbid conformant implementations from interpreting non-
>> shortest forms for BMP characters,
>
> I find this silly. That creation of such forms would be forbidden I
> can see and agree to. But interpretation? I understand the reasoning
> when security is an issue. But why make it flat illegal? There are
> many applications where such a sequence poses no security danger.
I used to be concerned about that. I think I cited the example of an
encyclopedia on CD-ROM with text in UTF-8. Obviously this text is all
internal and almost certainly valid, and there are no security holes
involved, so the UTF-8 decoder can take certain shortcuts.
But this is now covered in the corrigendum:
> Internally, a particular function might be used that does not check
> for illegal code unit sequences. However, a conformant process can
> use that function _only_ on data that has already been certified to
> not contain any illegal code unit sequences.
The word "certified" did make me chuckle, though. Who would do the
certifying? Katherine Harris?
-Doug Ewell
Fullerton, California
This archive was generated by hypermail 2.1.2 : Tue Jul 10 2001 - 17:21:15 EDT