Dear, Mr. Mark Davis.
Now I have found something wrong in the technical report 17.
http://www.unicode.org/unicode/reports/tr17/
> UTF-8 provides a good example:
> ...
> 0x80..0x3FF ---> 2 bytes
> 0x400..0xD7FF, 0xE000..0xFFFF ---> 3 bytes
> ...
but, in the RFC 2279 UTF-8, the below is described.
> 0000 0080-0000 07FF 110xxxxx 10xxxxxx
> 0000 0800-0000 FFFF 1110xxxx 10xxxxxx 10xxxxxx ( excluding surrogate )
Should it be modified as the following?
> 0x80..0x7FF ---> 2 bytes
> 0x800..0xD7FF, 0xE000..0xFFFF ---> 3 bytes
Best regards,
Masahiko
This archive was generated by hypermail 2.1.2 : Tue Jul 10 2001 - 17:20:51 EDT