Communicator and Unicode revisited

From: Adrian Havill (havill@threeweb.ad.jp)
Date: Tue Sep 16 1997 - 01:14:51 EDT


Thanks to everyone who was patient and set me straight about the
Accept-Charset in Navigator 4.02. I realize now that the Accept-Charset
is different from the regular "Accept" header in that a wildcard "*"
matches all character sets and sets them to a q value of 1.0.

I was confusing this with the "Accept" handling of the wildcards, which
states that "the most specific reference has precedence."
(<URL:http://www.w3.org/Protocols/HTTP/1.1/draft-ietf-http-v11-spec-08.txt>).
In other words, "text/html" has precedence over "text/*" if they're at
the same "q". Apparently the design for "Accept-Charset" is not
orthagonal. :(

Enough with the HTTP protocol diversion. My remaining questions
pertaining to Unicode as as follows:

1) Is they're all of equal precendence, why not just send
"Accept-Charset: *" or not send the header at all? And why does
Navigator 4 bother to send "iso-8859-1" when that Draft-08 says that
"character set can be assumed to be acceptable to all user agents". The
request header is for "... clients capable of understanding more
comprehensive or special-purpose character sets to signal that
capability to a server ..."

Do the other major browsers out that (IE and Hotjava, for example) plan
to announce "UTF-8" support in the HTTP header as well? If so, this is
very significant in that a very large portion of the personal computer
browser market will finally be announcing "yes, I speak Unicode." This,
IMO, is the biggest boost for Unicode since Java.

As Unicode support is up-and-coming, we are very hesitant about serving
a Unicode file to a client that can't handle it. However, I hope that
those clients that DO understand Unicode announce it, so the new and
improved file can be served. Ideally, in the future, everything will
migrate, but until then, I hope that the future browsers out there that
support Unicode announce themselves, spreading Unicode's acceptance by
the end user.

2) Can Navigator read a UCS-2 file if it doesn't have a byte-order mark
in the front? I've tried both big and little endian formats, as well as
setting the header to return "UNICODE-1-1" (the real HTTP header, not
the META tag) (is there a "UCS-2" charset type?) even though the header
is there, if the byte order isn't, it garbles both big and little
endian. U+FEFF there, no problem.

-- 
Adrian Havill <URL:http://www.threeweb.ad.jp/>
Engineering Division, Third Systems Section



This archive was generated by hypermail 2.1.2 : Tue Jul 10 2001 - 17:20:36 EDT