Re: Representing Unix filenames in Unicode

From: Hans Aberg (haberg@math.su.se)
Date: Sun Nov 27 2005 - 15:19:41 CST

  • Next message: Marcin 'Qrczak' Kowalczyk: "Re: Representing Unix filenames in Unicode"

    On 27 Nov 2005, at 21:44, Philippe Verdy wrote:

    > Why doesn't the Unix boot block that describes the nature and
    > content of the filesystem has no place to encode the encoding it uses?

    To begin with, there is no unified implementation of a UNIX
    filesystem, but it varies with the implementation.

    > At least there should exist a standard mark for UTT-8 or UTF-16.
    > Then it's would be up to the OS to present to applications the byte
    > streams corresponding to the encoded UTF. The OS could enforce this
    > encoding because it would know really how to convert the byte
    > streams encoded by the application under its locale to the
    > underlying encodingused in the filesystem (the OS could then return
    > an error for invalid new filenames).

    One reason is that the hard disk level is very low level, which
    handles only byte strings anyway, and one does not want slow it down.
    Another problem arises when mixing filesystems form the same
    computer. And files may show up with byte strings in unknown
    encodings. If the filesystem on this level enforces a model, they are
    usually lost; if it doesn't, one can fix it up later from the OS. But
    I only gave some reasons speaking for one model.

    > Note that network filesystems must also support an explicit
    > specification of their encoding. This means modifying NFS. All
    > filesystem tools should be alsoable to check that filesystems are
    > properly encoded (this is already possible with FAT32 that includes
    > LFN support and with NTFS).
    > CDROM filesystems also have this support (The Joliet extension to
    > ISO9660).

    It is simply complicated to make this work perfectly in a diverse
    world. You never want to loose any file quietly because its filename
    cannot be handled by the filesystem.

    > If nothing else, the presence of a special metadata file in the
    > filesystem (with a wellknown and reserved filename at a fixed path
    > location) with provide these properties (similar to MacOS
    > convention for storing the resource forks on FAT volumes)

    The Mac OS resource forks have been killed of as of Mac OS 10.4, now
    being handled by the underlying UNIX in for example the case of
    fonts. So the development is going another direction.

       Hans Aberg



    This archive was generated by hypermail 2.1.5 : Sun Nov 27 2005 - 15:22:51 CST