From: Hans Aberg (haberg@math.su.se)
Date: Thu Jan 20 2005 - 04:40:54 CST
Didn't Unicode have a principle to merely provide the characters, but not
impose requirements on their use? Then the requirement that programs should
ignore the BOM contradicts that principle. It is then that break that causes
problems on the UNIX platforms.
It is much better if the BOM is illegal in UTF-8. It does not prevent MS to
use it, instead labelling it as a file format marker for MS text files. A
program that then deals with MS text files must then know about the BOM and
remove it when and if appropriate. At the same time, it does not cause any
problems for programs that normally do not handle MS text files but only
plain text: They are fine as they are. Everyone should be able to be happy.
In fact, one idea might be to add \xFFFE and \xFFFF as delimiters for file
format markers. Then programs that do not need such markers need not deal
with them. Other program can make use of them, or simply remove them at
will. Such markers could also be used to alter the format within the same
stream.
Hans Aberg
This archive was generated by hypermail 2.1.5 : Thu Jan 20 2005 - 04:42:03 CST