From: Hans Aberg (haberg@math.su.se)
Date: Thu Jan 20 2005 - 18:52:35 CST
On 2005/01/20 19:38, Andrew C. West at andrewcwest@alumni.princeton.edu
wrote:
>> The BOM in UTF-8 is not the 0xFEFF UTF-8 encoded number, but 0xFEFF
>> appearing as though in UTF-16. 0xFEFF is Unicode number, and could be still
>> translated into UTF-8. So the BOM in UTF-8 is a really strange animal.
> The BOM generated by Notepad and other Windows applications at the start of
> UTF-8 files is 0xEF 0xBB 0xBF, which is the UTF-8 transformation of the the
> valid Unicode character U+FEFF, and so no process that claims to process UTF-8
> files should have any problem. If you do get 0xFEFF at the start of (or
> anywhere
> in) a UTF-8 file, then that IS very wrong ... but I've never seen such an
> animal.
Sorry, then I misunderstoofd that. Then it is even more meaningless, because
the point of the UTF-16 BOM is that it can detect byte swapping. Unicode has
decided that text files should be prepended with an ad hoc character of no
particular use.
Hans Aberg
This archive was generated by hypermail 2.1.5 : Thu Jan 20 2005 - 18:54:35 CST