Re: UTF-8 validation rules

From: Kenneth Whistler (kenw@sybase.com)
Date: Mon Sep 10 2001 - 21:26:48 EDT


> Also, if you're converting to, say, UTF-16, then non-character sequences
> like \xEF\xBF\xBE and \xEF\xBF\xBF should probably be converted to the
> corresponding UTF-16 non-characters (\uFFFE and \uFFFF), rather than being
> rejected. (Note: Unicode 3.1 and ISO/IEC 10646-1:2000 differ on this point;
> 10646 requires them to be rejected.)

This discrepancy has been noted by the relevant committees, and is
the subject of ballot comment in the current amendment of 10646.
It should be fixed soon.

--Ken



This archive was generated by hypermail 2.1.2 : Mon Sep 10 2001 - 22:09:11 EDT