RE: CESU-8: to document or not

From: Carl W. Brown (cbrown@xnetinc.com)
Date: Mon Sep 17 2001 - 15:58:17 EDT


Addison,

>
> By providing a documented, standard way to refer to legacy
> versions of these products and their encodings, I can more
> readily rely on having a well-documented range of protocols and
> procedures for converting and validating data exchanged with
> these systems. The argument that these products "merely support
> an older version of the Unicode standard" is specious, because
> the older versions merely made the six-byte form permissable by
> way of omission (the six-byte form was *never* the preferred
> form). The older versions say nothing about mixing the two forms,
> for example. Whether we dignify this encoding with a name or not,
> someone needs to fully document the rules and provide a stable
> basis for supporting this usage.

If we limit CESU-8 to BMP characters it does provide a clean migration path for legacy systems. It puts any systems that you might wish to communicate with the this system on notice that it does not support these characters yet. This solves the problem is a way that pressures people to upgrade their systems in the future rather than introducing a design split.

You have been out in the real world for some time. As we all know temporary work arounds tend to live forever.

There is absolutely no reason why Peoplesoft & Oracle can not document the protocol in their own manuals. This will give implementers further warning that these are proprietary character sets unless they want to restrict the use to BMP characters only. It will also force the implementers to check to see if the Oracle & Peoplesoft implementations are the same so that it will be absolutely clear that it is a non-standard protocol.
 
>
> For what it's worth, I thank Toby for braving the heat to produce
> this document. As a practical matter, I don't support the
> creation of new CESU-8 systems and will be grappling for a place
> on the walls to throw hot oil down on the barbarians who propose
> them, but for supporting our existing legacies (which cannot
> merely be extinguished "in the next release"), I think the effort
> is valuable. And the wording of the UTR seemed restrictive enough
> to me, at least, to be able to support the UTR (since it provides
> me the ammunition to oppose its adoption in practice).
>

I agree that there is a world of software out there that does not support Unicode 3.1 yet. Toby has a legitimate problem. It is the proposed solution that bothers me. For now I suspect that living with the BMP restrictions should not pose a severe hardship on most systems today. Moving on to fully implement the full Unicode range should be the carrot to upgrade current code. PDUTR #26 is the wrong way to go because it puts a new demand on systems that have already converted. It also creates more work by doing things twice. Adding proper library of CESU-8 support functions is probably more work than upgrading from UCS-2 to UTF-16.

Carl



This archive was generated by hypermail 2.1.2 : Mon Sep 17 2001 - 14:45:51 EDT