Asmus Freytag wrote:
> ... As a consequence, Notepad does not read
> Unicode files w/o a BOM (even little endian ones)...
To my surprise a week or two ago I discovered that it does
recognise Unicode files without the BOM. I think it looks at
the line endings. This does not detract from the argument.
And, while I am writing, here is another point:
> On the web a higher level of precision is needed. Specifying protocols
> in terms of serialized byte streams and therefore requiring MSB
> canonical byte ordering increases data security and in these situtaions,
> the overhead of transposition (in the worst case twice) is fully acceptable.
> It's hard to find any arguments there.
It is a pity that this complicates the procedure for web publishing,
which is usually:
- Edit the file on your own computer
- Put it in the right directory on the server (via a remotely
mounted disc, or FTP usually)
If you want to send big-endian UCS2 in the HTTP stream, you need
a text editor that can save text as bytes in big-endian UCS2, or arrange
for a conversion at publishing time, or by the web server.
We need a balance between efficiency (number of bytes for East Asian
characters) and user perceptions of complication (why are there four
formats for Unicode web pages (big-endian, little-endian, UTF-8, &#nnnn;)?!).
Charles
This archive was generated by hypermail 2.1.2 : Tue Jul 10 2001 - 17:20:34 EDT