Re: ISO 2022

From: Martin J. Dürst (mduerst@ifi.unizh.ch)
Date: Thu Oct 23 1997 - 11:35:02 EDT

Next message: Mark Leisher: "Re: ECU and euro - both exist"
Previous message: Jonathan Rosenne: "Re: Caring about European requirements sensitively!"
Maybe in reply to: John Cowan: "Re: ISO 2022"
Next in thread: Glenn Adams: "Re: ISO 2022"
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ] [ attachment ]
Mail actions: [ respond to this message ] [ mail a new topic ]

On Wed, 22 Oct 1997, John Cowan wrote:

> With six requests for my text in a few hours, I have decided to
> risk the Wrath of Svarasati and post it here. Please note -- as
> I forgot to mention last time -- that replies must be sent to
> cowan@ccil.org; direct responses will reach me but are unreplyable-to
> by me (the From: header gets corrupted).

Hello John,

I think it is very important to say that ISO 2022 is a toolbox
out of which individual things, with more or less variability,
can be composed. ISO 2022 has actually more variability than
what you describe. A few examples (there are more):

> ISO 2022 text is specified using a mixture of registered character sets.
> At any time, up to four character sets may be available. Character sets
> have one of three sizes: single-byte character sets with 94 characters
> (e.g. ASCII), single-byte character sets with 96 characters (e.g. the top
> halves of ISO Latin-1 to Latin-5), or double-byte character sets with
> 94 x 94 characters (e.g. JIS 0208X-1983).

There can also be triple-byte (and quadruple-byte,...) sets, together
with double-byte sets called multibyte sets. And these can also be
96 x 96 (x 96...).

> Each registered character set has
> a standard designating byte in the range 48 to 125; the bytes are unique within
> character set sizes, but may be reused across sizes.

Some can have two bytes for designation, or more, in the future.

> The four available character sets

Your use of "character set" above is already rather questionable,
it should be "coded character set". But here it gets more complicated.
It's not really the sets that are labeled G0,..., it's the slots they
can be assigned (designated) to.

> are labeled G0, G1, G2, and G3. Initially,
> G0 is the 94-character set ASCII, and G1 is the 96-character set ISO Latin-1
> (top half).

This is the case in some places, but not at all in general.
To use ISO 2022, you always need a common agreement, or some
additional introductory ESC sequences.

> The other character sets are unassigned. The following escape
> sequences (where ESC = the byte 27) specify changes to the available
> character sets:
>
> ESC ( <D> Set G0 to the 94-character set <D>
> ESC ) <D> Set G1 to the 94-character set <D>
> ESC * <D> Set G2 to the 94-character set <D>
> ESC + <D> Set G3 to the 94-character set <D>
> ESC - <D> Set G1 to the 96-character set <D>
> ESC . <D> Set G2 to the 96-character set <D>
> ESC / <D> Set G3 to the 96-character set <D>
> ESC $ <D> Set G0 to the 94 x 94 character set <D>
> ESC $ ( <D> Set G0 to the 94 x 94 character set <D>
> ESC $ ) <D> Set G1 to the 94 x 94 character set <D>
> ESC $ * <D> Set G2 to the 94 x 94 character set <D>
> ESC $ + <D> Set G3 to the 94 x 94 character set <D>
>
>
> Note that G0 may not be a 96-character set, and that there are two ways to
> specify a 94 x 94 character set in G0, of which the first is deprecated.

It is not really deprecated. It applies to an exactly specified
set of sets, those that have been registered under the previous
version of the standard. For these, it is the one that has to
be used.

> ISO 2022 decoding affects input bytes in the ranges 33 to 126 and 160 to 255,
> known as "the left half" and "the right half" respectively. All other bytes,
> unless they belong to a control sequence shown in this document, remain
> unchanged.

No. There is a quite similar, although somewhat simpler, system,
for control character blocks.

> This rich schema may be used in various ways. In ISO-2022-JP, the Japanese
> flavor of ISO 2022, only the bytes 33-126 and the G0 character set is used,
> and escape sequences are used to switch between ASCII, ISO-646-JP (the
> Japanese national variant of ASCII), and JIS 0208X-1983.

And older and newer versions of JIS 208. Also, please note that
ISO-2022-JP is not in fact conformant to ISO 2022, despite its name,
because it uses designations only, anew on each line, whereas
the basic idea of ISO 2022 is to use designations once, or once on
each line, and then only invocations. This is clearly stated in
the new version of JIS 208, namely JIS X 0208:1997, Appendix 2
(normative), Note to item 1.

I still think that for understanding ISO 2022, it's best to look
at the standard itself. Once you have understood that it's a
toolbox (out of which you never can use all the options at the
same time, because some combinations are explicitly forbidden),
the rest is not that difficult anymore.

Regards, Martin.

Next message: Mark Leisher: "Re: ECU and euro - both exist"
Previous message: Jonathan Rosenne: "Re: Caring about European requirements sensitively!"
Maybe in reply to: John Cowan: "Re: ISO 2022"
Next in thread: Glenn Adams: "Re: ISO 2022"
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ] [ attachment ]
Mail actions: [ respond to this message ] [ mail a new topic ]

This archive was generated by hypermail 2.1.2 : Tue Jul 10 2001 - 17:20:37 EDT