I propose that the distinction between illegal and irregular UTF-8
code sequences (D36bc) be eliminated. Since there are no code points
between U+D7FF and U+E000 (the apparently intervening code points
are UTF-16 code units, but not Unicode code points)
the corresponding UTF-8 code sequences should be illegal.
This can be achieved by replacing the U+1000..U+FFFF row in
Table 3.1B as follows:
U+1000..U+CFFF E1..EC 80..BF 80..BF
U+D000..U+D7FF ED 80..9F 80..BF [9F underscored]
U+E000..U+FFFF EE 80..BF 80..BF
-- There is / one art || John Cowan <jcowan@reutershealth.com> no more / no less || http://www.reutershealth.com to do / all things || http://www.ccil.org/~cowan with art- / lessness \\ -- Piet Hein
This archive was generated by hypermail 2.1.2 : Tue Jul 10 2001 - 17:21:18 EDT