From: Hans Aberg (haberg@math.su.se)
Date: Mon Jun 01 2009 - 02:21:39 CDT
On 1 Jun 2009, at 00:25, Doug Ewell wrote:
>> I think also strictly speaking there are two UTF-8s: one which does
>> not have the integer limitations that are used in Unicode. This
>> could be used to convert integers sequences into byte sequences
>> which then do not have Unicode character interpretation.
>
> There is only one UTF-8, the one defined by Unicode and ISO/IEC
> 10646, which maps valid Unicode/10646 scalar values to sequences of
> bytes. Anything else is not UTF-8. Keep repeating this to yourself.
I was just reading the successor sequence of RFCs:
http://tools.ietf.org/html/rfc2044
http://tools.ietf.org/html/rfc2279
http://tools.ietf.org/html/rfc3629
The last one restricts UTF-8 to the Unicode range, the limitations of
UTF-16, but the others do not.
Hans
This archive was generated by hypermail 2.1.5 : Mon Jun 01 2009 - 02:24:55 CDT