From: Yung-Fong Tang (ftang@netscape.com)
Date: Tue Feb 18 2003 - 15:10:07 EST
I read the RFC 2279 again ( 
http://www.cis.ohio-state.edu/cs/Services/rfc/rfc-text/rfc2279.txt )
1.  I cannot find any text in it mentioned about. non short form is 
invalid UTF8, and
2. It mentioned about 1-6 octets of UTF8
3. It mentioned about how to encode surrogate pair to UTF-8. But it does 
not say the UTF8 sequence mapping directly to Surrogate High and 
Surrogate Low are illegal
I remember in last couple year the definitation of UTF-8 is changing 
from 1-6 bytes to 1-4 octets because the decision of the future roadmap 
of Unicode/ISO 10646.
Here is my question;
1. Is there an updated RFC obsoleted RFC 2279 ? (I cannot find it, if we 
have one, what is the number? and URL)
2. Is there a formal speciification talk about non short form is illegal 
in UTF8 (the RFC2279 mentioned very lightly, but does not formal specify 
that is illegal. It only mentioned that are security concern) and 
directly encode Surrogate is illegal? or maybe the language in RFC2279 
is good enough.
3. Is there a formal specification mentioned that UTF-8 is only 1-4 
octects and therefore update the part the RFC2279 mentioned 1-6 octects?
Thanks
This archive was generated by hypermail 2.1.5 : Tue Feb 18 2003 - 15:54:02 EST