Re: UTF-17

From: Kenneth Whistler (kenw@sybase.com)
Date: Thu Jun 21 2001 - 20:45:21 EDT


Markus,

Thank you for your comment.

> Nice, but you have the same kind of shortest-form problem as in UTF-8:
> <38 30 30 30 30 30 30 30> could be mis-interpreted by a lenient decoder as U+0000.

Well, actually, that is not technically a "shortest-form problem". All
UTF-17 forms are exactly 8 bytes long, so any valid form is automatically
also a shortest form.

Furthermore, lenient decoders are not allowed for UTF-17.

<38 30 30 30 30 30 30 00> is specified as the *unique* representation
of U+0000. That means that <38 30 30 30 30 30 30 30> is ill-formed,
and therefore disallowed.

>
> Ts, ts...
>
> At least it sorts binary in code point order.

Yes, good point. Rick and I have added that to the Internet Draft
for UTF-17.

--Ken

>
> markus
>
>



This archive was generated by hypermail 2.1.2 : Fri Jul 06 2001 - 00:17:18 EDT