Re: 8859-1, 8859-15, 1252 and Euro

From: Alain LaBonté  (alb@sct.gouv.qc.ca)
Date: Wed Feb 09 2000 - 09:58:53 EST


À 14:56 2000-02-07 -0800, A. Vine a écrit:
>Tim Greenwood wrote:
> >
> > Pretty much all of the pages on the web, and the browsers, ignore the
> > differences between ISO-8859-1 and Windows code page 1252.
>
>I wish they would! I'm pretty sick of seeing question marks where there
>should
>be quotes, apostrophes, bullets, em-dashes, etc.
>
> >
> > So what is a system that stores all data in Unicode and converts for web
> > output to do with U+20AC? The formally correct process would seem to be to
> > convert to 0x80 only for CP1252 (and the other CP12xx sets) to 0xa4 for
> > ISO-8859-15 and to the 'not a character in this set' sign for ISO-8859-1.
> > This may be formally correct, but would not help the majority of users. For
> > that we would convert to 0x80 for ISO-8859-1 - it works even though
> 'wrong'.
>
>Sure, if you don't care about Unix users.

[Alain] And if you don't care about Mac users too.

    This problem is not limited to the web. In integral French we have
tremendous email problems transmitting œ (oe), Œ (OE) and Ÿ (Y:) between
Windows' users and Macs (CD 1252 alos encoded this in the C1 zone, which
maes it unsuitable in the Unix world or in the IBM mainframe world).

    It would have been simpler to modify 8859-1 once and for all but we did
things in the right, civilized way (not in the way that Korean was
obsoleted in 10646-1), in other words we produced International standard
ISO/IEC 8859-15 in hoping that this would solve at once the problem of
EURO, French and Finnish in the 8-bit world in a standard way while
providing a means to convert back and forth to Unicode, also in a standard way.

    It should still be the prefered way.

    For email, for Western European countries and the Americas, the
solution seems to me typically to use Windows coding internally on a PC,
Mac coding internally on a PC, 8859-15 coding under Unix, a 8859-15-like
EBCDIC répertoire on an IBM mainframe (I do not remember the code page),
Unicode under a Unicode-conformant email software, and to do the exchanges
either in UTF-8 for email software that can handle this or in 8859-15 if
8-bit technology can't be bypassed internally at each end (the interim
solution for a few years, imho -- it is a harmonious solution), all this
MIME-tagged.

    I believe the problem is the same for HTML. I would prefer the
quick-and-dirty solution that assumes 8859-15 even if it is tagged 8859-1,
rather than the one that assumes it is code page 1252 when tagged 8859-1.
This would allow for a harmonious operation between Macs, PCs, the Unix
world and IBM mainframes. Right now it is not clean, it does not work (for
the EURO sign and for French and Finnish), in addition to be a
quick-and-dirty solution (for which I, for one, am not fanatically opposed
when it makes things work).

Alain LaBonté
Québec



This archive was generated by hypermail 2.1.2 : Tue Jul 10 2001 - 17:20:58 EDT