Re: Is there Unicode mail out there?

From: Mark Davis (mark@macchiato.com)
Date: Wed Jul 18 2001 - 11:07:03 EDT


> I wouldn't want any control codes in a database. Having a control-G
> may be funny (the joke as I know it goes back to Don Knuth), but
> something like a control-S is too much of a risk.

*You* wouldn't want?

There are a lot of characters *I* wish were not in databases, or in use at
all. A lot of them may or may not make sense. Whether or not I want them,
someone can have a database where they are allowed. By having this
(inconsistent) restriction, it simply means I can't be guaranteed full
round-tripping from databases to XML and back, no matter what their
content.

Of course, this is not a huge restriction -- it is simply a gratuitous
annoyance. One could even live with something much more onerous, say XML
disallowing all characters whose code points were divisible by 4321 -- just
have complicated DTDs and shift into base64 if you encounter any of those
codes.

Mark
—————

πάντων μέτρον ἄνθρωπος — Πρωταγόρας
[http://www.macchiato.com]

----- Original Message -----
From: "Martin Duerst" <duerst@w3.org>
To: "Mark Davis" <mark@macchiato.com>; "John Cowan"
<jcowan@reutershealth.com>
Cc: <unicode@unicode.org>; "Lars Marius Garshol" <larsga@garshol.priv.no>
Sent: Tuesday, July 17, 2001 18:36
Subject: Re: Is there Unicode mail out there?

> At 14:30 01/07/17 -0700, Mark Davis wrote:
> > > In that case the content of the field is not text but an octet string,
> > > and you need to do something different, like base64-ing it.
> >
> >The content in the database is not an octet string: it is a text field
that
> >happens to have a control code -- a legitimate character code -- in it.
> >Practically every database allows control codes in text fields. (And why
are
> >C1 controls allowed? After all, they are even less frequent than C0
> >controls.)
>
> Mark - I understand your dissatisfaction. But the C1 controls are not
> allowed in HTML4, and according to James Clark, the fact that they are
> allowed in XML was an oversight.
>
> Databases can (and should) keep care of their data. There are very
> few cases where having control characters in there makes sense.
> In the most cases, however, they are errors, and if XML gives an
> incentive to fix them, all the better.
>
> I wouldn't want any control codes in a database. Having a control-G
> may be funny (the joke as I know it goes back to Don Knuth), but
> something like a control-S is too much of a risk.
>
>
> Regards, Martin.
>
>



This archive was generated by hypermail 2.1.2 : Wed Jul 18 2001 - 12:10:16 EDT