Unicode 3.1: UTF-8

From: John Cowan (jcowan@reutershealth.com)
Date: Wed Jan 31 2001 - 14:40:15 EST


I propose that the distinction between illegal and irregular UTF-8
code sequences (D36bc) be eliminated. Since there are no code points
between U+D7FF and U+E000 (the apparently intervening code points
are UTF-16 code units, but not Unicode code points)
the corresponding UTF-8 code sequences should be illegal.

This can be achieved by replacing the U+1000..U+FFFF row in
Table 3.1B as follows:

U+1000..U+CFFF E1..EC 80..BF 80..BF
U+D000..U+D7FF ED 80..9F 80..BF [9F underscored]
U+E000..U+FFFF EE 80..BF 80..BF

-- 
There is / one art             || John Cowan <jcowan@reutershealth.com>
no more / no less              || http://www.reutershealth.com
to do / all things             || http://www.ccil.org/~cowan
with art- / lessness           \\ -- Piet Hein



This archive was generated by hypermail 2.1.2 : Tue Jul 10 2001 - 17:21:18 EDT