From: Hans Aberg (haberg@math.su.se)
Date: Thu Jan 20 2005 - 12:16:30 CST
On 2005/01/20 14:14, Christopher Fynn at cfynn@gmx.net wrote:
> Hans Aberg wrote:
>
>
>> It is much better if the BOM is illegal in UTF-8. It does not prevent MS to
>> use it, instead labelling it as a file format marker for MS text files. A
>> program that then deals with MS text files must then know about the BOM and
>> remove it when and if appropriate. At the same time, it does not cause any
>> problems for programs that normally do not handle MS text files but only
>> plain text: They are fine as they are. Everyone should be able to be happy.
>
> Since BOM is a valid Unicode & ISO 110646 character and UTF-8 is a
> transformation format of Unicode & 10646, if BOM were illegal in UTF-8
> it couldn't be used for *all* Unicode characters.
The BOM in UTF-8 is not the 0xFEFF UTF-8 encoded number, but 0xFEFF
appearing as though in UTF-16. 0xFEFF is Unicode number, and could be still
translated into UTF-8. So the BOM in UTF-8 is a really strange animal.
This archive was generated by hypermail 2.1.5 : Thu Jan 20 2005 - 12:18:04 CST