RE: CP1252 under Unix

From: Chris Pratley (chrispr@MICROSOFT.com)
Date: Fri Mar 24 2000 - 20:33:12 EST


I may be heading where angels fear to tread, but...

Frank, please. It is of no benefit in this always-connected world for any
large corporation to push its own character encoding over Unicode. Let me
debunk that myth yet again. If there were some benefit to that, why is
Microsoft pushing Unicode standard compliance throughout all its products,
arguably faster than any other commercial or non-commercial developer? The
reasons are that it helps customers and helps the developer. The founding
and other long-time members of the Unicode Consortium were comprised of the
very same large corporations you imply are inventing new code pages to
undermine Unicode.

I do not see many new character encodings being defined these days, do you?
The reason is that Unicode solves the problems that used to require new
encodings (such as, how do you support "smart" typographically correct
quotes and provide useful symbols like TradeMark to end-users that are
requesting them, even though the proposed 8-bit standard does not provide
them?). cp-1252 was not designed to subvert iso-8859-1. If that were the
case, it would not have been designed as a superset. When standards are not
sufficient for users' needs, they tend to get ignored or modified by the
developers. Unicode seems up to the task for almost all users, and is
getting better all the time. So Unicode is not in much danger of being
flouted unlike the ridiculously exclusionary micro-repertoire 8-bit
standards we used to deal with.

There is no holy war against non-Unicode encodings that has to be fought
anymore from where I stand. For example, at Microsoft we have customers and
governments coming to us on a regular basis and asking for a new code page
for their language. The standing answer is: you have it already - Unicode.
As Peter Constable and a few others will note, that doesn’t quite cut it yet
for languages not covered by Unicode, but there is no way Microsoft is about
to create a new codepage for those languages - we'd much rather wait for
Unicode to add them. I am sure other OS vendors have the same policy.

Markus seems to have the practical approach to the current problems. I was
curious why, if cp-1252 on the net is such a problem for Unix (Mac?) users,
no one is rushing to fix the browsing experience. It seemed until Markus and
Erik mentioned otherwise that years were dragging on with no significant
improvement in the user experience for Unix browser users. It seems to me a
quixotic argument: if Unix browsers just keep refusing to handle the various
encodings out on the net those encodings will somehow dry up and stop being
used. Not very realistic, is it? More akin to self-flagellation. I'm glad to
see that some people are making an effort to fix the situation. Of course,
when they do that fix, the more important thing is to make the tools support
Unicode, and just map all those legacy encodings they care to support to
Unicode. There is no need to support every encoding known to humanity - just
pick off the top 10 or 20 that are used on the net and reduce the pain for
users.

Regards,
Chris Pratley
Group Program Manager
Microsoft Word

Sent with Office10 build1517ship Wordmail on

-----Original Message-----
From: Frank da Cruz [mailto:fdc@columbia.edu]
Sent: Friday, 24 March 2000 11:40 AM
To: Unicode List
Cc: unicode@unicode.org
Subject: Re: CP1252 under Unix

Markus wrote:

> Frank da Cruz wrote on 2000-03-24 18:30 UTC:

> > That's fine for browsers (except Lynx!).

>

> Lynx can easily recode incoming CP1252 and Unicode-NCR HTML files into

> UTF-8 and then output them on xterm, which does now support the full

> MES-3 repertoire and more with UTF-8.

>

But why should Lynx and every other application have to know every

character set in the world?  If CP1252, then why not any other CP?  Why

not EVERY other one?  Do you know how many code pages are in the IBM

Registry?  As of about 1992, there were well over 700.  I'm sure the

number has increased since then.  And that's only IBM.

Hey, somebody has to stick up for standards.  When we don't follow them,

we make more work for ourselves, more grief for our users, and more

difficulties for the information archaelogists of the future.  And it's

all completely unnecessary.  The only gain from ignoring standards is to

the companies who do it.  Bending the rules for their benefit only puts

you to work for them as unpaid labor, thereby compounding the original

transgression and -- worse -- legitimizing it.

When our philosophy becomes "every software package must support every

encoding ever made up by anybody", then it becomes very difficult for small

companies to produce software -- only the big ones can afford to hire the

warehouses full of programmers needed to keep up with every crazy thing that

pops up in the world.  Why should anybody adopt Unicode when they can just

keeping piling on the code pages?

In such an atmosphere, good ideas no longer have any value because they

can't be put into practice without also simultanesously supporting all the

bad ideas, which only the rich and powerful can afford to do.  This is not

a good direction.

- Frank



This archive was generated by hypermail 2.1.2 : Tue Jul 10 2001 - 17:21:00 EDT