ISO 2022, ISO 10646, and Unicode

From: Kenneth Whistler (kenw@sybase.com)
Date: Fri Jan 17 1997 - 17:56:48 EST


Frank Tang notes:

> Because ISO-2022 is a code switch mechanism. The think make it work
> is all the graphic codeset it could switch only use 0x20-0x7F and
> 0xA0-0xFF.
> Since Unicode itself (ISO-10646) does not follow ISO-2022. It will be
> difficult to add Unicode into the ISO-2022 scheme.

It is correct to state that ISO 2022 is a code switch mechanism. But
it is also important to be clear about the fact that ISO 10646 *already*
has a specified relation to ISO 2022. In case my earlier note on this
was unclear:

Clause 17.2 of 10646 specifies the 2022 escape sequences
for identifying various forms of 10646.

ESC 02/05 02/15 04/05 (i.e. ESC %/E) specifies 10646 UCS-2, implementation
        level 3 (i.e. basically Unicode).

Clause 17.5 of 10646 specifies the 2022 escape sequence for
return to 2022 from 10646.

ESC 02/05 04/00 specifies return to 2022 from 10646. If using UCS-2 (i.e.
        Unicode), this is padded to 16-bits: U+001B U+0025 U+0040.

So *if* you are using 2022, there is already a specified way to get into
and out of the 16-bit version of 10646. When you are in a stretch of data
specified as UCS-2, implementation level 3 (or 1 or 2, for that matter),
then effectively you are using Unicode at that point. [The problem is not
getting the standards to have the specification; rather, the problem is
extending existing 2022 implementations so that they will handle embedded
10646 data correctly.]

It is the Unicode Standard, per se, which does not have any mechanism for
escape to other encodings. 2022 is the superstructure which makes it possible
to do that with 10646 (and thereby with Unicode data), if you so desire.

--Ken Whistler



This archive was generated by hypermail 2.1.2 : Tue Jul 10 2001 - 17:20:33 EDT