From: Kenneth Whistler (kenw@sybase.com)
Date: Mon Apr 04 2005 - 12:02:42 CST
> >They're new characters, Philippe. They weren't encoded until 4.1.
> >
Peter Kirk continued:
> In that case these character allocations seem perverse, given that both
> of these characters could have been assigned to the BMP, or both to
> outside it
Perverse it may be, but there is no point in casting implied
asperversions at the UTC.
It was perverse of the DPRK standards body to add them to
PKS C-5700 in the first place.
It was perverse of the DPRK to insist that they be encoded
in 10646 in the BMP.
It was perverse of WG2 to assign them to the BMP.
But it was not perverse of the UTC to acquiesce in that assignment,
to guarantee continued synchronization between the standards.
> It could also be a serious
> security hole, as hackers try sending U+FACF to various implementations
> in an attempt to crash them.
Crying "security hole!" seems to be the Fad Of The Month on the
Unicode list, but this isn't one of them.
In any conformant Unicode 4.0.1 (or earlier) version of normalization,
U+FACF normalizes to (tada!) U+FACF. If it doesn't, the normalizer
isn't conformant. If sending U+FACF to such a normalizer crashes
an application, then shame on the programmer.
In any conformant Unicode 4.1.0 version of normalization, U+FACF
normalizes to U+2284A. If it doesn't, the normalizer isn't
conformant. If sending U+FACF to such a normalizer crashes
an application, then shame on the programmer.
There is a very good set of normalization test data available for
both Unicode 4.0.0 and now for Unicode 4.1.0. Anyone who puts
out an implementation of normalization that cannot pass the
appropriate version test deserves what they get.
In neither case is this a security hole *caused* by the allocation.
--Ken
This archive was generated by hypermail 2.1.5 : Mon Apr 04 2005 - 12:03:39 CST