Re: FSS-UTF, UTF-2, UTF-8, and UTF-16

From: DougEwell2@cs.com
Date: Wed Jun 20 2001 - 01:37:18 EDT


In a message dated 2001-06-19 10:36:40 Pacific Daylight Time,
cbrown@xnetinc.com writes:

> I agree with you, the problem is that the D800 to DFFF codes were never
> defined as valid Unicode characters.

True; there were never characters assigned into these positions.

> Encoding these into ED xx xx codes has
> never produced valid Unicode code points in UTF-8.

False; prior to Unicode 2.0 they were perfectly valid code points (not
characters) in the so-called O-zone. The addition of surrogate code points
created a hole in the O-zone (sorry, I couldn't resist :) and removed them
from the realm of valid code points.

> Thefore any of these
> codes in the database were never valid Unicode characters at any point in
> the Unicode standard. As a consequence there is no backwards compatibility
> issue.

True; there should be no such data in any database anywhere. But be careful
about "characters" vs. "code points." The values in question were never
characters, but they were once valid code points. My favorite example,
U+0220, has always been a valid code point but not an assigned character
(until Unicode 3.2).

The only backwards compatibility issue comes from software written in the
UCS-2 days that disregarded the UTF-8 specification as to the encoding of
values above U+FFFF in more than 3 bytes.

-Doug Ewell
 Fullerton, California



This archive was generated by hypermail 2.1.2 : Fri Jul 06 2001 - 00:17:18 EDT