RE: CP1252 under Unix

From: Frank da Cruz (fdc@columbia.edu)
Date: Thu Mar 30 2000 - 11:50:08 EST


Robert A. Rosenberg wrote:
> At 05:16 AM 03/29/2000 -0800, Robert Brady wrote:
> >On Tue, 28 Mar 2000, Robert A. Rosenberg wrote:
> >
> > > match MS's choice of mappings). For UNIX, just do this automatically even
> > > if the HTML says ISO-8859-1 since there should never be any control
> > > characters in that range and if the codepoints do occur then they are
> >
> >That would be acceptance of standards subversion. And it doesn't work.
> >(Theres plenty of KOI8-R, ISO-8859-2, [insert other standard here]) tagged
> >as ISO-8859-1, at least on usenet.
>
> While I will accept your claim (of lots of misID'ed charsets) for the sake
> of this discussion, I fail to see its relevance. If I create a
> message/whatever in ISO-8859-2 but mark it as ISO-8859-1, the high-ASCII
> xA0-xFF codepoint range will display incorrectly (You'll get the 8859-1 not
> the expected 8859-2 glyphs). Treating any ISO-8859-1 claim as if marked as
> Windows-1252 will CORRECTLY display all valid (x00-x7F+xA0-xFF codepoints
> only) ISO-8859-1 content while still allowing for the use of the extra 32
> glyphs (in particular the Typographic Quotes). For the other misID'ed cases
> that you refer to, you will get the same incorrect display as using
> ISO-8859-1 so you have no worse a display by treating ISO-8859-1 as
> Windows-1252.
>
All true, but only for the Web. Does not apply to ISO-2022-compliant
applications. The beauty of standard character sets is that they work for
all interchange applications, not just certain kinds.

By the way, I mentioned this once before, but just to be sure everybody is
aware: the fact that CP1252 0xA0-0xFF (GR) coincides with ISO 8859-1 does
not carry over to the other CP12xx's:

 Windows ISO GR Matches Domain
 CP1250 8859-2 No Latin-2 (East Europe Roman)
 CP1251 8859-5 No Latin/Cyrillic
 CP1252 8859-1 Yes Latin-1 (West Europe Roman)
 CP1253 8859-7 No Latin/Greek
 CP1254 8859-3? -9? No Turkish
 CP1255 8859-8 No Hebrew
 CP1256 8859-6 No Arabic
 CP1257 8859-4 No Latin-4 (Baltic)

- Frank



This archive was generated by hypermail 2.1.2 : Tue Jul 10 2001 - 17:21:00 EDT