From: Christopher JS Vance (unicode@nu.org)
Date: Sun Nov 27 2005 - 20:39:01 CST
On Sun, Nov 27, 2005 at 06:45:23PM +0100, Hans Aberg wrote:
>This problem has recently been discussed in the POSIX/UNIX
>standardization list (Austin Group List, http://www.opengroup.org/
>austin/). It should really be best resolved there, because one needs
>to find an efficient solution for a UTF-8 enabled UNIX OS, and in
>doing that, one has to take things into account such as how to
>implement efficient files systems. One possible approach might be to
>ensure any byte string can be represented on the filesystems level,
>with suitable UTF-8 encodings for use in text strings (and the
>property that they can be lifted back to the original byte strings),
>which may vary from context to context. This approach would be
>motivated by the fact that almost all filesystems already work this
>way, and that it would be inefficient to burden it with character
>interpretation schemes. But some filesystems, though rare it seems,
>use a different approach. And when fiddling around with this, one
>needs to assess its effect on the total UNIX OS, probably making some
>implementations first. In the meantime, I figure you can invent the
>encoding schemes that best fits your needs.
UTF-8, created as FSS-UTF, was invented specifically to enable its use
for Unix/POSIX and similar filenames.
The problem is people trying to create filenames which aren't UTF-8.
Provided you use the same character set for all filenames, the problem
was solved before the Unicode/10646 merger (see Plan 9 from Bell Labs).
-- Christopher Vance
This archive was generated by hypermail 2.1.5 : Sun Nov 27 2005 - 20:40:17 CST