From: John Cowan (jcowan@reutershealth.com)
Date: Tue Dec 07 2004 - 22:12:16 CST
Kenneth Whistler scripsit:
> Storage of UNIX filenames on Windows databases, for example,
> can be done with BINARY fields, which correctly capture the
> identity of them as what they are: an unconvertible array of
> byte values, not a convertible string in some particular
> code page.
This solution, however, is overkill, in the same way that it would
be overkill to encode all 8-bit strings in XML using Base-64
just because some of them may contain control characters that are
illegal in well-formed XML.
> In my opinion, trying to do that with a set of encoded characters
> (these 128 or something else) is *less* likely to solve the
> problem than using some visible markup convention instead.
The trouble with the visible markup, or even the PUA, is that
"well-formed filenames", those which are interpretable as
UTF-8 text, must also be encoded so as to be sure any
markup or PUA that naturally appears in the filename is
escaped properly. This is essentially the Quoted-Printable
encoding, which is quite rightly known to those stuck with
it as "Quoted-Unprintable".
> Simply
> encoding 128 characters in the Unicode Standard ostensibly to
> serve this purpose is no guarantee whatsoever that anyone would
> actually implement and support them in the universal way you
> envision, any more than they might a "=93", "=94" convention.
Why not, when it's so easy to do so? And they'd be *there*,
reserved, unassignable for actual character encoding.
Plane E would be a plausible location.
-- John Cowan <jcowan@reutershealth.com> http://www.reutershealth.com I amar prestar aen, han mathon ne nen, http://www.ccil.org/~cowan han mathon ne chae, a han noston ne 'wilith. --Galadriel, LOTR:FOTR
This archive was generated by hypermail 2.1.5 : Tue Dec 07 2004 - 22:13:10 CST