From: Philippe Verdy (verdy_p@wanadoo.fr)
Date: Tue Aug 24 2004 - 17:52:29 CDT
From: "John H. Jenkins" <jenkins@apple.com>
> On Aug 23, 2004, at 3:34 PM, Doug Ewell wrote:
>
> > Deborah Goldsmith <goldsmit at apple dot com> wrote:
> >
> >> FYI, by far the largest source of text in NFD (decomposed) form in
> >> Mac OS X is the file system. File names are stored this way (for
> >> historical reasons), so anything copied from a file name is in (a
> >> slightly altered form of) NFD.
> >
> > "Slightly altered"?
> >
>
> Yes, the specification for the Mac file system was frozen before NFD
> had been developed by the UTC, so it isn't exactly the same. But it's
> close.
Yes it is very close to NFD. The actual decompositions performed are fully
listed in the documentation of the MacOS filesystems. Note that there are
differences between various Mac filesystems, which where also localized into
their driver (in a way quite similar to the legacy MSDOS filesystem with
their unpredictable codepage: notably when reading removable medias where
the codepage of the system creating that media is not stored on the
support...)
Actually, it was based on decompositions in Unicode 2.01. But the list of
decompositions is now frozen with a specific Unicode version in the
filesystem driver, for compatibility reasons. This is needed because some
medias may be created later with characters from a later version of Unicode,
which was still not supported in the driver of a legacy system in which the
media would be used. It is even more important for networked filesystems for
security reasons.
Because of the same security reasons, Windows filesystems will NOT normalize
Unicode filenames, which are stored as a binary vector of UTF-16 codeunits
(with some of them restricted for special usage, or forbidden, notably for
code-units/code-ppoints in the ASCII range that have some predefined
functions, or are exclusions such as most controls), and optionally mapped
to a secondary "short" 8.3 name using a local OS codepage.
However, it is highly recommanded to use the NFC form when creating Unicode
filenames on Windows (notably because it offers round-trip compatibility
with filenames created in a Windows codepage where characters are
precomposed). If you create a filename with decomposed characters in NFD
form, you may not be able to open that file using the filename encoded in
the Windows or OEM codepage (the filesystem will not find it, as it uses a
simple one-to-one mapping from the codepage codes to Unicode codepoints in
NFC form).
This archive was generated by hypermail 2.1.5 : Wed Aug 25 2004 - 09:54:50 CDT