Re: Translated IUC10 Web pages: Experimental Results

From: Martin J. Duerst (mduerst@ifi.unizh.ch)
Date: Wed Feb 05 1997 - 11:14:38 EST


On Wed, 5 Feb 1997, Misha Wolf wrote:

> I think it very unlikely that plain 16-bit Unicode will be adopted by
> browsers in the next year or two.

Why not? It is more compact for East Asia (apart from the fact that
compression can be used anyway). I might understand if you would say
that it might not be adopted by content providers. But for browsers,
supporting UCS2/UTF-16 in addition to UTF-8 is an extremely small
addition, so I don't even see why there is discussion about it.

>The two encoding schemes which will
> be widely used to encode Unicode Web pages are:
>
> 1. UTF-8 (see <http://www.reuters.com/unicode/iuc10/x-utf8.html>).
> 2. Numeric Character References (see <http://www.reuters.com/unicode/iuc10/x-ncr.html>).
>
> The second scheme is intriguing as it does not require the use of any
> octets over 127 decimal (7F hex). Accordingly, it is legal to to label
> such a file as, eg, US-ASCII, ISO-8859-1, X-SJIS, or any other "charset"
> which has ASCII as a subset.

It is not very harmful to label such pages ISO-8859-1 or whatever.
But strictly speaking, it is not legal! If there are alternatives
for labeling, the most restrictive label should be used. If it's
labeled us-ascii, you know that it's going to pass though 7-bit
mail. Otherwise, you don't.

I don't see that much of future popularity for purely NCR-coded
documents. These are more valuable for cases where you want to
add a character or two from a script not supported in the
local encoding used, e.g. a Kanji or two to a German document
or so.

Regards, Martin.



This archive was generated by hypermail 2.1.2 : Tue Jul 10 2001 - 17:20:33 EDT