From: Doug Ewell (doug@ewellic.org)
Date: Thu Nov 11 2010 - 11:26:08 CST
Mark Davis 😎 wrote:
> There are superset relations among some of the CJK character sets, and
> also -- practically speaking -- between some of the windows and
> ISO-8859 sets. I say practically speaking because in general
> environments, the C1 controls are really unused, so where a non
> ISO-8859 set is same except for 80..9F you can treat it pragmatically
> as a superset.
There was a time, about 10 years ago, when Frank da Cruz would have
replied almost immediately about the importance of C1 controls in
terminal environments, and the arguments about incompatibility between
8859-1 and Windows-1252 would have been off and running.
That was about the same time that people like Roman Czyborra were
complaining that their terminals were scrambled by text encoded in
UTF-8, because of its use of bytes in the 80..9F range, and people like
Jörg Knappen were creating alternative UTF's to get around this
perceived problem.
Regarding the subset/superset terminology, we need to distinguish
between "encoding subsets" and "repertoire subsets":
* ASCII is both an encoding subset and a repertoire subset of 8859-1 and
Windows-1252 and UTF-8.
* 8859-1 is an encoding subset of Windows-1252, except for the 80..9F
range.
* 8859-1 and Windows-1252 are repertoire subsets, but not encoding
subsets, of UTF-8.
* 8859-15 is neither type of subset of 8859-1.
* Etc.
-- Doug Ewell | Thornton, Colorado, USA | http://www.ewellic.org RFC 5645, 4645, UTN #14 | ietf-languages @ is dot gd slash 2kf0s
This archive was generated by hypermail 2.1.5 : Thu Nov 11 2010 - 11:31:36 CST