Re: Translated IUC10 Web pages: Experimental Results

From: Alain LaBont/e'/ (alb@sct.gouv.qc.ca)
Date: Wed Feb 05 1997 - 11:31:25 EST

Next message: Martin J. Duerst: "Re: Translated IUC10 Web pages: Experimental Results"
Previous message: Kent Johnson: "Re: 64K Tables"
Maybe in reply to: becker.osbu_north@xerox.com: "Re: Translated IUC10 Web pages: Experimental Results"
Next in thread: Johan Zeeman: "Re: Translated IUC10 Web pages: Experimental Results"
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ] [ attachment ]
Mail actions: [ respond to this message ] [ mail a new topic ]

At 09:43 97-02-05 -0500, Misha Wolf wrote:
>Chris Pratley wrote:
>
>>[snip]
>
>>Our assumption was that UTF-8 was the only Web-safe encoding that was
>>reasonably likely to be adopted by browsers in the near future. Is that
>>the consensus, or are raw UCS2 encodings being considered actively by
>>people on this alias?
>
>I think it very unlikely that plain 16-bit Unicode will be adopted by
>browsers in the next year or two. The two encoding schemes which will
>be widely used to encode Unicode Web pages are:
>
> 1. UTF-8 (see <http://www.reuters.com/unicode/iuc10/x-utf8.html>).
> 2. Numeric Character References (see
<http://www.reuters.com/unicode/iuc10/x-ncr.html>).
>
>The second scheme is intriguing as it does not require the use of any
>octets over 127 decimal (7F hex). Accordingly, it is legal to to label
>such a file as, eg, US-ASCII, ISO-8859-1, X-SJIS, or any other "charset"
>which has ASCII as a subset. Browser vendors: Please check your products
>against the pages referenced above.
>
>>[snip]
>
>Regards,
>Misha

I do not understand why it is more complicated to use UCS-2 than any other
scheme (apart from the little-endian problem, which should be deprecated in
the state of the art of the XXIth Century, it is a patch!) The web requires
8- bits-per-octet encoding (thank God! otherwise even UTF-8 would not work)
as its default character set is ISO/IEC 8859-1. A wise implementer should
implement at least:

        -Latin 1
        -UTF-8
        -entity names
        -UCS-2 (big-endian at least, little-endian as a patch if indicated
                clearly!)

Anyway the logic, one the source data has been normalized, should be the
same after all. I am pretty sure nobody uses UTF-8 or even entity names as
its canonical processing encoding... That would be a nonsense. But who
knows, masochism exists, I know (:

Alain LaBonti (version : 8 bits --- (-: )
Alain LaBont/e'/ (version : 7 bits --- )<:= !@#$%?&*()_+-=^~',."!!!)

Next message: Martin J. Duerst: "Re: Translated IUC10 Web pages: Experimental Results"
Previous message: Kent Johnson: "Re: 64K Tables"
Maybe in reply to: becker.osbu_north@xerox.com: "Re: Translated IUC10 Web pages: Experimental Results"
Next in thread: Johan Zeeman: "Re: Translated IUC10 Web pages: Experimental Results"
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ] [ attachment ]
Mail actions: [ respond to this message ] [ mail a new topic ]

This archive was generated by hypermail 2.1.2 : Tue Jul 10 2001 - 17:20:33 EDT