RE: DEC multilingual code page, ISO 8859-1, etc.

From: Chris Pratley (chrispr@MICROSOFT.com)
Date: Wed Mar 29 2000 - 18:44:03 EST


See>>>

-----Original Message-----
From: A. Vine [mailto:avine@eng.sun.com]
Sent: March 29, 2000 3:10 PM
To: Unicode List
Subject: Re: DEC multilingual code page, ISO 8859-1, etc.

Jungshik Shin wrote:
>
> On Tue, 28 Mar 2000, Erik van der Poel wrote:
>
> > That depends on the particular private code page. In the case of
> > windows-1252, there are far more users with mail software that works
> > with windows-1252 than UTF-8 (or any other encoding of Unicode).
> >
> > There is an old saying "Be conservative in what you send, liberal in
> > what you accept". As far as windows-1252 vs UTF-8 is concerned, sending
> > UTF-8 is *not* conservative. We need to wait until more people have
> > UTF-8 capable software installed.
>
> I don't think sending out Windows-1252 on the wire is 'conservative'
> either. (I'm so tired of getting Windows-1252 encoded messages with
> NOT-SO-ESSENTIAL characters interpersed among valid ISO-8859-1 characters
> that I'm gonna write a procmail filter to remove/transliterate them) If
> you really wanna stick to that saying, you should convert WIndows-1252
> to ISO-8859-1(with appropriate transliteration) on the way out just
> like more standard compliant programs on MacOS side do with MacRoman.
> Better still is give users the choice between UTF-8 and ISO-8859-1 with
> transliteration for chars only present in Windows-1252.
>

Again I concur. I don't see any need for "smart" quotes in text, which is
the
source of most of the question marks I get when I read Web pages created
with
Microsoft products (regardless of the tag, Chris) and emails as well. I do
not
have a font for 1252, and I don't think it's fair to make Unix folks a) find
a
font and b) figure out how to install it so that CDE recognizes it, just
because
MS decided to provide folks with "smart" quotes and fancier bullets for
otherwise plain text documents, all the while not informing them that people
who
don't have a Windows box will see question marks or something equally silly.
Sure I can read it. I can read quoted printable too. The point is that
existing quotes work just "fine" but question marks make you slow down to
determine what they're ?doing? there.

Giving users the option to convert these $proprietary$ characters into an
ISO-8859-1 or UTF-8 equivalent would make sense. Explaining to people that
folks who don't have Windows will probably ^not^ be able to see these
characters
would be decent.
>>>The option exists. Users can choose the output encoding and set it to
pretty much anything.
>>>For web pages, I disagree that people do not want nice quotation marks,
trademark symbols, etc. As evidence, I give the large number of pages that
intentionally add these characters to their pages using the (incorrect)
“ and so on. People type these and expect them to appear in browsers.
The reason Microsoft products use windows-1252 (labelled as such) for web
pages is that that was the best way to deliver this content to the largest
number of users. If we use entities like ™ they are not supported by
enough browsers. If we use NCRs, they are not supported by enough browsers.
Windows-1252 (labelled correctly) gets the largest penetration of correct
display. I understand this is annoying to some, but the goal was broadest
WYSIWYG on the largest number of browsers.
>>>Explaining to users anything about encoding is basically impossible, and
putting up such an alert would be extremely annoying to a huge number of
users, a much larger number than are annoyed by the output (which is
correctly labelled).

--
Andrea Vine, avine@eng.sun.com, iPlanet i18n architect
...even if it requires not really a dance with the Devil, but
call it a brief shimmy with his accountant's daughter.
-- Sean Burke http://www.netadventure.net/~sburke/



This archive was generated by hypermail 2.1.2 : Tue Jul 10 2001 - 17:21:00 EDT