Feedback on the proposal to change U+FFFD generation when decoding ill-formed UTF-8
Hans Åberg via Unicode
unicode at unicode.org
Tue May 16 11:23:51 CDT 2017
> On 16 May 2017, at 18:13, Alastair Houghton <alastair at alastairs-place.net> wrote:
> On 16 May 2017, at 17:07, Hans Åberg <haberg-1 at telia.com> wrote:
>>>>> HFS(+), NTFS and VFAT long filenames are all encoded in some variation on UCS-2/UTF-16. ...
>>>> The filesystem directory is using octet sequences and does not bother passing over an encoding, I am told. Someone could remember one that to used UTF-16 directly, but I think it may not be current.
>>> No, that’s not true. All three of those systems store UTF-16 on the disk (give or take).
>> I am not speaking about what they store, but how the filesystem identifies files.
> Well, quite clearly none of those systems treat the UTF-16 strings as binary either - they’re case insensitive, so how could they? HFS+ even normalises strings using a variant of a frozen version of the normalisation spec.
HFS implements case insensitivity in a layer above the filesystem raw functions. So it is perfectly possible to have files that differ by case only in the same directory by using low level function calls. The Tenon MachTen did that on Mac OS 9 already.
More information about the Unicode