ISO/IEC JTC1/SC2/WG2
N2369Universal Multiple-Octet Coded Character Set
International Organization for Standardization
Organisation internationale de normalisation
Doc Type: | Working Group Document |
Title: | Request to allow FFFF, FFFE in UTF-8 in the text of ISO/IEC 10646 |
Source: | Unicode Technical Committee |
Status: | Liaison Statement |
Action: | For adoption by JTC1/SC2/WG2 |
Date: | 2001-09-26 |
The Unicode Technical
Committee requests that WG2 change its definition of UTF-8 to allow the
representation of the code points U+FFFF and U+FFFE. These are disallowed in
ISO/IEC 10646, but are clearly an anomaly: other non-characters (U+1FFFE,
U+1FFFF, etc.) as well as the new non-characters U+FDD0..U+FDEF are
allowed.
Moreover,
these code points are all legal in HTML: see the SGML declaration
(http://www.w3.org/TR/REC-html40/sgml/sgmldecl.html).
The 10646 definition of UTF-8 should be amended as soon as possible to allow all non-characters to be represented in UTF-8.