Unicode 3.1: UTF-8

From: John Cowan (jcowan@reutershealth.com)
Date: Wed Jan 31 2001 - 14:40:15 EST

Next message: John Cowan: "Unicode 3.1: Georgian (editorial)"
Previous message: Michael Everson: "Re: ConScript registry?"
Next in thread: David Starner: "Re: Unicode 3.1: UTF-8"
Maybe reply: David Starner: "Re: Unicode 3.1: UTF-8"
Maybe reply: Mark Davis: "Re: Unicode 3.1: UTF-8"
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ] [ attachment ]
Mail actions: [ respond to this message ] [ mail a new topic ]

I propose that the distinction between illegal and irregular UTF-8
code sequences (D36bc) be eliminated. Since there are no code points
between U+D7FF and U+E000 (the apparently intervening code points
are UTF-16 code units, but not Unicode code points)
the corresponding UTF-8 code sequences should be illegal.

This can be achieved by replacing the U+1000..U+FFFF row in
Table 3.1B as follows:

U+1000..U+CFFF E1..EC 80..BF 80..BF
U+D000..U+D7FF ED 80..9F 80..BF [9F underscored]
U+E000..U+FFFF EE 80..BF 80..BF

-- 
There is / one art             || John Cowan <jcowan@reutershealth.com>
no more / no less              || http://www.reutershealth.com
to do / all things             || http://www.ccil.org/~cowan
with art- / lessness           \\ -- Piet Hein

Next message: John Cowan: "Unicode 3.1: Georgian (editorial)"
Previous message: Michael Everson: "Re: ConScript registry?"
Next in thread: David Starner: "Re: Unicode 3.1: UTF-8"
Maybe reply: David Starner: "Re: Unicode 3.1: UTF-8"
Maybe reply: Mark Davis: "Re: Unicode 3.1: UTF-8"
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ] [ attachment ]
Mail actions: [ respond to this message ] [ mail a new topic ]

This archive was generated by hypermail 2.1.2 : Tue Jul 10 2001 - 17:21:18 EDT