Re: Unicode 3.1: UTF-8

From: Mark Davis (markdavis34@home.com)
Date: Wed Jan 31 2001 - 22:23:27 EST


This is not an omission. This issue was debated at great length in the
Unicode technical committee, and the precise wording was agreed to by the
committee.

Mark
----- Original Message -----
From: "John Cowan" <jcowan@reutershealth.com>
To: "Unicode List" <unicode@unicode.org>
Sent: Wednesday, January 31, 2001 11:18
Subject: Unicode 3.1: UTF-8

> I propose that the distinction between illegal and irregular UTF-8
> code sequences (D36bc) be eliminated. Since there are no code points
> between U+D7FF and U+E000 (the apparently intervening code points
> are UTF-16 code units, but not Unicode code points)
> the corresponding UTF-8 code sequences should be illegal.
>
> This can be achieved by replacing the U+1000..U+FFFF row in
> Table 3.1B as follows:
>
> U+1000..U+CFFF E1..EC 80..BF 80..BF
> U+D000..U+D7FF ED 80..9F 80..BF [9F underscored]
> U+E000..U+FFFF EE 80..BF 80..BF
>
> --
> There is / one art || John Cowan <jcowan@reutershealth.com>
> no more / no less || http://www.reutershealth.com
> to do / all things || http://www.ccil.org/~cowan
> with art- / lessness \\ -- Piet Hein
>



This archive was generated by hypermail 2.1.2 : Tue Jul 10 2001 - 17:21:18 EDT