From: Lars Kristan (lars.kristan@hermes.si)
Date: Wed Dec 15 2004 - 08:50:27 CST
Arcane Jill wrote:
> The obvious solution is for all Unix machines everywhere to
> be using the
> same locale - and it had better be UTF-8. But an instantaneous global
> switch-over is never going to happen, so we see this gradual
> switch-over ...
> and it is during this transition phase that Lars's problem manifests.
Yes, some may not experience it, some will experience it for a day, some for
a month, some for a year, some indefinitely.
And unless filesystems prevent invalid sequences to be added, it will keep
happening to everybody. And if very seldom, then it will be even harder to
find a person who can fix it.
> Of course, you are suggesting not /really/ suggesting that
> the Unix kernel
> be rewritten. But it's hard to for me to see how else this could be
> achieved.
What one might pursue is to make the UNIX filesystem invariant, so
Windows-like. In that scenario, a filesystem stores Unicode strings and
adjusts the representation of filenames according to user's locale. But
there are two reasons against it:
A - If only the filesystem does it, then whenever you switch the locale, all
references to files in other files break. Unless you treat the files in the
same manner, which is what Windows does if an application is not Unicode
(with a number of associated problems on top). But that is not what is
supposed to be done on UNIX.
B - As we move to UTF-8, there will be less and less need to use different
locales. So why bother with enabling the system to represent UTF-8 in any
other locale if that locale will not even be used anymore. Concerns with the
transition period do apply, but then you end up with two transitions, which
is even less appealing.
So, the only percievable option is to start thinking about validation in the
filesystem. If and when one choses to enable it. But keep in mind that it
will only reduce the problem. Not all programs will be able to rely on it
(like virus scanners, HSM, backup, ...).
Lars
This archive was generated by hypermail 2.1.5 : Wed Dec 15 2004 - 08:58:20 CST