Re: Translated IUC10 Web pages: Experimental Results

From: Charles Wicksteed (charles.wicksteed@reuters.com)
Date: Mon Feb 10 1997 - 05:19:30 EST


Asmus Freytag wrote:

> ... As a consequence, Notepad does not read
> Unicode files w/o a BOM (even little endian ones)...

To my surprise a week or two ago I discovered that it does
recognise Unicode files without the BOM. I think it looks at
the line endings. This does not detract from the argument.

And, while I am writing, here is another point:

> On the web a higher level of precision is needed. Specifying protocols
> in terms of serialized byte streams and therefore requiring MSB
> canonical byte ordering increases data security and in these situtaions,
> the overhead of transposition (in the worst case twice) is fully acceptable.
> It's hard to find any arguments there.

It is a pity that this complicates the procedure for web publishing,
which is usually:
- Edit the file on your own computer
- Put it in the right directory on the server (via a remotely
   mounted disc, or FTP usually)
If you want to send big-endian UCS2 in the HTTP stream, you need
a text editor that can save text as bytes in big-endian UCS2, or arrange
for a conversion at publishing time, or by the web server.
We need a balance between efficiency (number of bytes for East Asian
characters) and user perceptions of complication (why are there four
formats for Unicode web pages (big-endian, little-endian, UTF-8, &#nnnn;)?!).

Charles



This archive was generated by hypermail 2.1.2 : Tue Jul 10 2001 - 17:20:34 EDT