From: Peter Edberg (pedberg@apple.com)
Date: Mon Dec 14 2009 - 14:39:49 CST
On Dec 14, 2009, at 12:26 PM, Julian Bradfield wrote:
> On 2009-12-14, Peter Edberg <pedberg@apple.com> wrote:
>> On Dec 14, 2009, at 10:30 AM, Leo Broukhis wrote:
>>> This problem is with us already (on Apple systems, of all things).
>>> MacOS X decomposes Cyrillic Й and Ё in file names and treats файл and
>>> файл as the same file name
>> Which seems appropriate, since they are canonically equivalent.
>>> Windows and Linux don't.
>> So the question is, why not?
>
> For the very obvious reason that the system locale may not be utf-8.
> I'm sure someone can come up with an example of two utf-8 canonically
> equivalent strings that both make (different) sense in some other
> encoding.
On Dec 14, 2009, at 11:11 AM, Leo Broukhis wrote:
> A file system is a map of tuples of "short" strings of non-zero,
> non-solidus bytes to potentially long strings of arbitrary bytes. Why
> should there be any storage-level assumption about the text property
> of any of these strings?
The desirable behavior I am describing refers, of course, to behavior at a higher level - a level at which the the file "name" is already explicitly specified to be text in Unicode using a specified encoding scheme (e.g. UTF16), as is true in Apple's HFS Extended volume format, NTFS, and other volume formats. At that level I think it is reasonable to enforce canonical equivalence.
-Peter E
This archive was generated by hypermail 2.1.5 : Mon Dec 14 2009 - 14:41:03 CST