Richard Wordingham wrote:
> Just to complicate matters, most documents encoded using ISO/IEC 2022
> rely on default initial settings, and so to interpret them it is not
> enough to say it is in an ISO/IEC 2022 encoding, but instead one must
> specify the particular encoding, which then defines the initial
> states.
ISO 2022 does require a particular initial state, but the ones Richard
is talking about are specific to ISO 2022-based encodings, such as
ISO-2022-CN or ISO-2022-JP. Those are really different encodings from
generic ISO 2022; in addition to the secret magic initial state, they
may also allow certain shortcuts in the switching characters which
aren't allowed in fully conformant 2022.
Asmus Freytag wrote:
> ISO 2022 allows switching among sets in mid stream, but as far as I
> remember (haven't had to think about this since Unicode came around)
> the code unit is still a byte, except that sometimes pairs of bytes
> are used. As I remember, ISO 2022 was still far from widely supported
> in the late 80's and practically not at all on the fast growing PC
> sector.
ISO 2022 code units are indeed bytes, even for the double- or
(theoretical) triple-byte sets, and it was indeed almost never used on
PCs.
I think it's important to remember that Roger's original question to the
list was "Can a single text document use multiple character encodings?"
He didn't ask if such a practice was common, or confusing, or a good
idea, though perhaps those were underlying questions.
-- Doug Ewell | Thornton, CO, USA http://ewellic.org | @DougEwell Received on Wed Aug 28 2013 - 20:33:36 CDT
This archive was generated by hypermail 2.2.0 : Wed Aug 28 2013 - 20:33:37 CDT