From: Edward H. Trager (ehtrager@umich.edu)
Date: Tue Dec 14 2004 - 10:32:08 CST
On Tuesday 2004.12.14 12:50:43 -0000, Arcane Jill wrote:
> If I have understood this correctly, filenames are not "in" a locale, they
> are absolute. Users, on the other hand, are "in" a locale, and users view
> filenames. The same filename can "look" different to two different users.
> To user A (whose locale is Latin-1), a filename might look valid; to user B
> (whose locale is UTF-8), the same filename might look invalid.
Correct. The problem will however be limited to the accented
Latin characters present in ISO-8859-1 beyond the ASCII set. The basic Latin
alphabet in the ASCII set
at the beginning of both ISO-8859-1 and UTF-8 will appear unchanged to both
users (UTF-8 user looking at Latin-1's home directory, or Latin-1 looking at
UTF-8's home directory). So both users could probably guess the filename
they were looking at. For example, here is a file on my local machine,
a Linux box with the locale set to LANG=en_US.UTF-8:
déclaration_des_droits.utf8
The accented "e" in "déclaration" appears correctly under the UTF-8 locale.
I then copied this file (using scp) over to an older Sun Solaris box which I do not administer,
so I have to live with the "C" POSIX locale that they have got that machine
set to. Now, when I
view the file names in a terminal (where the terminal emulator is set to
the same locale), I see:
d??claration_des_droits.utf8
The terminal, being set to interpret the legacy locale, does not know
how to interpret the two bytes that are used for the UTF-8 "é".
Still, I can guess that the first word should be "déclaration".
The solution, as has been pointed out, is for everyone to move to
UTF-8 locales. In the Linux and Unix world, this is already happening
for the most part. Solaris 10 now defaults to a UTF-8 locale, at least
when set to English. Both SuSE and Redhat default to UTF-8 locales
for most language and script environments. And (open source) tools exist for
converting file names from one encoding to another encoding on Linux
and Unix systems. A group of Japanese developers is working on an NLS implementation
for the BSDs like OpenBSD which are currently "stuck" with nothing but the "C"
POSIX locale. I think the name of that project is "Citrus".
-- Ed Trager
>
> Is that right, Lars?
>
> If so, Marcin, what exactly is the error, and whose fault is it?
>
> Jill
>
> -----Original Message-----
>
> From: unicode-bounce@unicode.org [mailto:unicode-bounce@unicode.org]On
>
> Behalf Of Marcin 'Qrczak' Kowalczyk
>
> Sent: 13 December 2004 14:59
>
> To: unicode@unicode.org
>
> Subject: Re: Roundtripping in Unicode
>
> Using non-UTF-8 filenames in a UTF-8 locale is IMHO an error.
>
>
>
>
>
>
This archive was generated by hypermail 2.1.5 : Tue Dec 14 2004 - 10:07:05 CST