From: Yung-Fong Tang (ftang@netscape.com)
Date: Tue Feb 18 2003 - 15:10:07 EST
I read the RFC 2279 again (
http://www.cis.ohio-state.edu/cs/Services/rfc/rfc-text/rfc2279.txt )
1. I cannot find any text in it mentioned about. non short form is
invalid UTF8, and
2. It mentioned about 1-6 octets of UTF8
3. It mentioned about how to encode surrogate pair to UTF-8. But it does
not say the UTF8 sequence mapping directly to Surrogate High and
Surrogate Low are illegal
I remember in last couple year the definitation of UTF-8 is changing
from 1-6 bytes to 1-4 octets because the decision of the future roadmap
of Unicode/ISO 10646.
Here is my question;
1. Is there an updated RFC obsoleted RFC 2279 ? (I cannot find it, if we
have one, what is the number? and URL)
2. Is there a formal speciification talk about non short form is illegal
in UTF8 (the RFC2279 mentioned very lightly, but does not formal specify
that is illegal. It only mentioned that are security concern) and
directly encode Surrogate is illegal? or maybe the language in RFC2279
is good enough.
3. Is there a formal specification mentioned that UTF-8 is only 1-4
octects and therefore update the part the RFC2279 mentioned 1-6 octects?
Thanks
This archive was generated by hypermail 2.1.5 : Tue Feb 18 2003 - 15:54:02 EST