Re: Representing Unix filenames in Unicode

From: Philippe Verdy (verdy_p@wanadoo.fr)
Date: Sun Nov 27 2005 - 14:44:33 CST

  • Next message: Hans Aberg: "Re: Representing Unix filenames in Unicode"

    From: "Hans Aberg" <haberg@math.su.se>
    > This problem has recently been discussed in the POSIX/UNIX
    > standardization list (Austin Group List, http://www.opengroup.org/
    > austin/). It should really be best resolved there, because one needs to
    > find an efficient solution for a UTF-8 enabled UNIX OS, and in doing
    > that, one has to take things into account such as how to implement
    > efficient files systems. One possible approach might be to ensure any
    > byte string can be represented on the filesystems level, with suitable
    > UTF-8 encodings for use in text strings (and the property that they can
    > be lifted back to the original byte strings), which may vary from context
    > to context. This approach would be motivated by the fact that almost all
    > filesystems already work this way, and that it would be inefficient to
    > burden it with character interpretation schemes. But some filesystems,
    > though rare it seems, use a different approach. And when fiddling around
    > with this, one needs to assess its effect on the total UNIX OS, probably
    > making some implementations first. In the meantime, I figure you can
    > invent the encoding schemes that best fits your needs.
    Why doesn't the Unix boot block that describes the nature and content of the
    filesystem has no place to encode the encoding it uses? At least there
    should exist a standard mark for UTT-8 or UTF-16. Then it's would be up to
    the OS to present to applications the byte streams corresponding to the
    encoded UTF. The OS could enforce this encoding because it would know really
    how to convert the byte streams encoded by the application under its locale
    to the underlying encodingused in the filesystem (the OS could then return
    an error for invalid new filenames).

    Note that network filesystems must also support an explicit specification of
    their encoding. This means modifying NFS. All filesystem tools should be
    alsoable to check that filesystems are properly encoded (this is already
    possible with FAT32 that includes LFN support and with NTFS).
    CDROM filesystems also have this support (The Joliet extension to ISO9660).
    If nothing else, the presence of a special metadata file in the filesystem
    (with a wellknown and reserved filename at a fixed path location) with
    provide these properties (similar to MacOS convention for storing the
    resource forks on FAT volumes)



    This archive was generated by hypermail 2.1.5 : Tue Nov 29 2005 - 09:02:13 CST