From: Philippe Verdy (verdy_p@wanadoo.fr)
Date: Sun Nov 27 2005 - 14:44:33 CST
From: "Hans Aberg" <haberg@math.su.se>
> This problem has recently been discussed in the POSIX/UNIX
> standardization list (Austin Group List, http://www.opengroup.org/
> austin/). It should really be best resolved there, because one needs to
> find an efficient solution for a UTF-8 enabled UNIX OS, and in doing
> that, one has to take things into account such as how to implement
> efficient files systems. One possible approach might be to ensure any
> byte string can be represented on the filesystems level, with suitable
> UTF-8 encodings for use in text strings (and the property that they can
> be lifted back to the original byte strings), which may vary from context
> to context. This approach would be motivated by the fact that almost all
> filesystems already work this way, and that it would be inefficient to
> burden it with character interpretation schemes. But some filesystems,
> though rare it seems, use a different approach. And when fiddling around
> with this, one needs to assess its effect on the total UNIX OS, probably
> making some implementations first. In the meantime, I figure you can
> invent the encoding schemes that best fits your needs.
Why doesn't the Unix boot block that describes the nature and content of the
filesystem has no place to encode the encoding it uses? At least there
should exist a standard mark for UTT-8 or UTF-16. Then it's would be up to
the OS to present to applications the byte streams corresponding to the
encoded UTF. The OS could enforce this encoding because it would know really
how to convert the byte streams encoded by the application under its locale
to the underlying encodingused in the filesystem (the OS could then return
an error for invalid new filenames).
Note that network filesystems must also support an explicit specification of
their encoding. This means modifying NFS. All filesystem tools should be
alsoable to check that filesystems are properly encoded (this is already
possible with FAT32 that includes LFN support and with NTFS).
CDROM filesystems also have this support (The Joliet extension to ISO9660).
If nothing else, the presence of a special metadata file in the filesystem
(with a wellknown and reserved filename at a fixed path location) with
provide these properties (similar to MacOS convention for storing the
resource forks on FAT volumes)
This archive was generated by hypermail 2.1.5 : Tue Nov 29 2005 - 09:02:13 CST