Re: Feedback on the proposal to change U+FFFD generation when decoding ill-formed UTF-8

From: Alastair Houghton via Unicode <unicode_at_unicode.org>
Date: Tue, 16 May 2017 16:52:00 +0100

On 16 May 2017, at 16:44, Hans Åberg <haberg-1_at_telia.com> wrote:
>
> On 16 May 2017, at 17:30, Alastair Houghton via Unicode <unicode_at_unicode.org> wrote:
>>
>> HFS(+), NTFS and VFAT long filenames are all encoded in some variation on UCS-2/UTF-16. ...
>
> The filesystem directory is using octet sequences and does not bother passing over an encoding, I am told. Someone could remember one that to used UTF-16 directly, but I think it may not be current.

No, that’s not true. All three of those systems store UTF-16 on the disk (give or take). On Windows, the “ANSI” APIs convert the filenames to or from the appropriate Windows code page, while the “Wide” API works in UTF-16, which is the native encoding for VFAT long filenames and NTFS filenames. And, as I said, on Mac OS X and iOS, the kernel expects filenames to be encoded as UTF-8 at the BSD API, regardless of what encoding you might be using in your Terminal (this is different to traditional UNIX behaviour, where how you interpret your filenames is entirely up to you - usually you’d use the same encoding you were using on your tty).

Kind regards,

Alastair.

--
http://alastairs-place.net
Received on Tue May 16 2017 - 10:52:14 CDT

This archive was generated by hypermail 2.2.0 : Tue May 16 2017 - 10:52:14 CDT