Re: Coding Systems Different from ISO 2022

From: Kenneth Whistler (kenw@sybase.com)
Date: Mon Oct 12 1998 - 18:01:12 EDT


Frank asked:

>
> Can anybody tell me where to find out what ISO means when it assigns an ISO
> 2022 escape sequence for a "coding system different from ISO 2022" (such as,
> for example, NAPLPS, or UCS-4, or UTF-8)? Is the intention to identify the
> coding system to the recipient, so it can switch to it, and also disable
> ISO-2022 character-set designation and invocation from that moment onwards,
> since we have now switched to a new coding system in which we will not
> necessarily be able to recognize escape sequences for further switching?
>
> In particular, I'm curious about an environment in which the host switches
> the terminal to the UTF-8 coding system. Since Unicode includes ASCII as
> well as C0 and C1 controls (and so UTF-8 can include both sets of controls
> too), should it be possible to switch out of UTF-8 coding once having
> switched into it? (I know, why would anybody ever want to switch out of
> UTF-8? :-)

This stuff is all laid out in excruciating detail in 10646 as regards
10646 and its encoding forms in particular.

Amendment 2 to 10646 (UTf-8) states (among other things):

"When the escape sequences from ISO/IEC 2022 are used, the
identification of a return, or transfer, from UTF-8 to the
coding system of ISO/IEC 2022 shall be as specified in
17.5 for a return or transfer from UCS."

And clause 17.5 of 10646 states:

"When the escape sequences form ISO/IEC 2022 are used, the
identification of a return, or transfer, from UCS to the coding
system of ISO/IEC 2022 shall be by the escape sequence ESC 02/05 04/00.
If such an escape sequence apears within a CC-data-element conforming
to ISO/IEC 10646, it shall be padded in accordance with clause 16.
If such an escape sequence appears within a CC-data-element conforming to
ISO/IEC 2022, it shall consist only of the sequences of bit combinations
as shown above."

In other words to get from UCS-2 (or UTF-16) to 2022, you
use U+001B U+0025 U+0040. To get from UTF-8 to 2022, you
use 0x1B 0x25 0x40. (ESC "%@") For UCS-4, it would be
U-0000001B U-00000025 U-00000040.

--Ken



This archive was generated by hypermail 2.1.2 : Tue Jul 10 2001 - 17:20:42 EDT