From: Philippe Verdy (verdy_p@wanadoo.fr)
Date: Mon Dec 01 2003 - 13:39:56 EST
Michael (michka) Kaplan writes:
> I would not expect Windows (whose most recent shipping version shipped
> before Unicode 4.0 was released) to support 4.0 properties and
> such. But at
> the same time, if you have fonts and build a keyboard you can support any
> number of 4.0-only scripts.
Isn't case folding standardized long before Unicode 4.0 ?
Well, the Windows case mappings for its NTFS filesystem predates Unicode,
and I think that Microsoft wants to avoid the nightmare of filesystems
migration. But I think that a NTFS filesystem should track the Unicode
version it was created with, so that the filesystem driver can adapt to the
set of folding rules supported on this system.
The other option would be to propose an option in CHKDSK to find files in
the same directory whose name would collide if new case folding rules were
applied. CHKDSK could propose to either list them (let the user choose which
name to keep, and which file must be renamed). If there's no conflict in a
given directory, it could be marked to support the newer Unicode rules.
There's an interesting question with FAT32: it was designed after NTFS to
add Unicode and LFN support on top of FAT16 and when Unicode was already
publishing standard case folding rules. I can't believe that Microsoft chose
for its LFN directory extensions to use the same folding rules as those used
in NTFS. May be what is wanted here is to maximize the compaitibility of
FAT32 with NTFS, even if NTFS has some defects.
For now we have to live with the past! I'm quite sure that lowercase Sharp-S
(ess-tzett) and double lowercase s are both used on German file-systems.
This is even the case on FAT filesystems with which both FAT32 and NTFS must
keep some compatibility (for short file names), as it uses the OEM codepage
(CP437 or CP850 in Germany) where Sharp-S has been allowed since long and
made distinct from double s.
If Windows was changed to use case folding of sharp-s to double s, then it
would have problems to read filesystems (including floppies which use FAT12
with the same naming constraints as FAT16) containing short filenames.
However this is mitigated by the fact that FAT12 and FAT16 have always been
ambiguous about the effective OEM charset they were encoded with.
Rremember the issues when migrating from Windows 3.x to Windows 95, because
of legacy filesystems created with ambiguous OEMCP-only short names, and
SCANDISK had also to be used for some time because they were applications
expecting OEMCP-encoded names that were conflicting sometimes between CP437
and CP850. Even after the upgrade, the current codepage of the running app
is still creating encoding conflicts detected later by CHKDSK or SCANDISK
when OEMCP encoded short names do not match their Unicode encoded LFN names.
SCANDISK proposes to trust the Unicode LFN name and alter the short name to
reflect in the current OEM codepage the effective Unicode name.
Even today there are such errors when, for some reason like virus infection,
the AUTOEXEC.BAT is not run at startup to fix the codepage, so that Windows
will start using short names in FAT filesystems with a new OEM codepage
distinct from the OEM codepage with which the filesystem was previously
used.
Thanks, going to Unicode has fixed all this: short names are retained for
compatibility. However FAT32 filesystems are still trying to open first the
file converted to short names in the current OEMCP before trying the LFN
name in Unicode. As FAT32 is definitely not dead or deprecated in favor of
NTFS (for some performance reasons, forgetting the stronger security and
stability of NTFS face to system crashes), we still have an issue in Windows
2000/XP/2003...
__________________________________________________________________
<< ella for Spam Control >> has removed Spam messages and set aside
Newsletters for me
You can use it too - and it's FREE! http://www.ellaforspam.com
This archive was generated by hypermail 2.1.5 : Mon Dec 01 2003 - 14:41:18 EST