Caring about European requirements sensitively!

From: Alain LaBont\i - 2 (alb@riq.qc.ca)
Date: Tue Oct 21 1997 - 20:38:28 EDT


A 12:12 22/10/97 -0700, Kenneth Whistler a écrit :
>
>Alain stated:
>
>> But I am happy that you are in favour of a practical and safe solution for
>> 8 bit character sets. The ISO/IEC 8859-15 (Latin 0) is the safest one, the
>> only that is safe, in fact, and very practical, infinitely more practical
>> than any other solution. Any other one so far, even with tagging, is
>> absolutely not reliable.
>>
>> Don't forget EBCDIC to Windows to ISO-8-bits to UNICODE and back,
>> preserving data integrity at all steps. Only Latin 0 will be able to
>> achieve this neatly and cleanly, in a standard way.
>>

[Ken] :
>Unfortunately this is *not* the case.
>
>The most widely used EBCDIC code pages (CP037 and CP500) have already
>had their repertoires carefully matched to ISO 8859-1. Round-tripping
>to 8859-15 will result in EBCDIC characters that do not convert to
>8859-15 and 8859-15 characters that do not convert to EBCDIC. Pushing
>for widespread switching from 8859-1 to 8859-15 will *worsen* data
>integrity for EBCDIC conversions. Furthermore, these 8859-1 converged
>EBCDIC code pages (there are others besides CP037 and CP500) have
>no code values open to add the Euro. They are in the exact same bind
>that the 8859-x series are in--no provision for incremental expansions
>to deal with something like the Euro. The Euro has the potential to
>cause as much mischief in the EBCDIC world as ASCII bracket characters
>used to.
>
>And Windows 1252 has additional characters that are not in 8859-15.
>Yes, the EURO SIGN can be converted between the newest version of
>Windows 1252 and 8859-15, but there are other characters in 1252
>which will still not convert to either 8859-x or to common EBCDIC
>code pages.
>
>This is hardly a recipe for neat and clean preservation of data
>integrity.
>
>I see the 8-bit standards world brewing up a world of hurt.
>
>Unicode, anyone?

[Alain] :
In practice, IBM will create a new EBCDIC code page to have the EURO. It is
likely to be using the same code position as the one that will be
"replaced" in Latin 0 (likely at the end the CURRENCY SYMBOL) out of Latin
1. It means that if this occur, mappings will still be valid. In the same
way when Latin 0 will be standardized this new code table will also contain
the other European characters missing as a defect in Latin 1 (which was
supposed to support French and Finnish fully but which did not do it). IBM
is likely to choose the same mapping positions that these characters will
have in Latin 0.

So compatibility will be clean there.

Windows-8-bit will not require data to be changed for 1252 files. It will
only require mapping to the ISO-reduced 191-character set for the pratical
data that is guaranteed to be standard (i.e. Latin 1 repertoire less 8
characters+their "replacement") and tag the data appropriately (that is the
requirement expressed by *European* countries last summer, plus Canada).
The change required for Windows is not internal, it is just for
interchange, so it is very smooth (it is also reestablishing a standard way
to communicate with Microsoft platforms in 8-bnit mode). And in practice,
EURO will have to be interchanged with EBCDIC, not only UNICODE (with
UNICODE too, of course). So what is being proposed in Latin 0 is clean to
do all this.

Now those UNIX-8-bit systems that want to implement Latin 0 will be happy
to be in the same bandwagon.

As for the additional few characters in 1252 that are left over (not many!)
in C1 control space, before we go to system-wide implementation of UNICODE,
the situation won't be worse than it is today. There exists so far no
requirement to exchange these characters with EBCDIC data, while there is a
European requirement to exchange the EURO SIGN, 3 French characters and 4
Finnish characters more than in Latin 1. When one wants to talk about
practical things, one has to talk practically. That's what the Latin 0
proposers have in mind, only practical considerations for the real world of
today and the 5 coming years at least.

All the destroyers of Latin 0 just have *un*solutions to propose to the
requirements. They only want a quick fix that is not even a fix to the
problems exposed and they do not even want to see the problems and try to
solve them really. They do not care mich really about actual European
problems much, should I say if I did not know that they also have good
intentions in mind, of course.

They do not respond to European requirements, whateber their goal is. In
doing so they are also threatening IBM and its huge installed base of
mainframes in Europe, I don't know if that is well realized.

Or, what they say that should be done for EBCDIC will generate eternal
conversion costs (data losses and round-trip integrity violation) for which
they will be blamed for decades, if I might express it simply. Fortunately
common sense will prevail and Latin 0 will be standardized.

Alain LaBonté
Cornwall (Ontario)



This archive was generated by hypermail 2.1.2 : Tue Jul 10 2001 - 17:20:37 EDT