Re: Unicode in VFAT file system

From: John Cowan (jcowan@reutershealth.com)
Date: Thu Jul 20 2000 - 15:40:36 EDT


Ken Krugler wrote:

> I thought that UCS-2 was by definition big endian

It's big-endian by *default*. If you have a BOM, you can determine the
polarity directly, but putting a BOM in every file name would be silly.
Windows file systems will only be used on LE machines, so storing everything
as LE is sensible (and is what Unicode calls a "higher-level protocol").

> 1. Could it be using UTF-16LE? I tried creating an entry with a
> surrogate pair, but the name was displayed with two black boxes on a
> Windows 2000-based computer, so I assumed that surrogates were not
> supported.

Probably not. So technically it *is* UCS-2 (LE) rather than UTF-16LE.

> 3. And finally, why are file names case-insensitive for characters in
> the U-0000 to U-00FF range, but not for any other characters? OK,
> maybe I can guess at the answer to that one...

Case insensitivity is a backwards-compatibility hack, basically.

-- 

Schlingt dreifach einen Kreis um dies! || John Cowan <jcowan@reutershealth.com> Schliesst euer Aug vor heiliger Schau, || http://www.reutershealth.com Denn er genoss vom Honig-Tau, || http://www.ccil.org/~cowan Und trank die Milch vom Paradies. -- Coleridge (tr. Politzer)



This archive was generated by hypermail 2.1.2 : Tue Jul 10 2001 - 17:21:06 EDT