Re: Communicator Unicode

From: Yung-Fong Tang (ftang@netscape.com)
Date: Fri Sep 12 1997 - 14:17:25 EDT


Adrian Havill wrote:

> I have a few questions about Netscape Communicator's implementation of
> Unicode which I was hoping someone could clarify:
>
> 1) The Navigator client (well, the Solaris SPARC 4.02 version is the one
> I'm most familiar with) includes the folloiwng header in it's doc
> request to a HTTP server:
>
> Accept-Charset: iso-8859-1, *, utf-8
>
> Question: Why's the UTF-8 -after- the asterick? (In the regular "Accept"
> field, the "*/*" is always last) Is there any way to change the order of
> these entries? Is this line consistent on all versions of Navigator 4?
> This is more of a HTTP than a Unicode question, but I was wondering if
> it being last in the list (and after the *) implied that it was the
> least preferred character encoding.

I don't believe you can find any thing about "least preferred charset" is
implied by the ORDER specified in the list in HTTP 1.1
(http://info.internet.isi.edu:80/in-notes/rfc/files/rfc2068.txt).

The "Accept-Charset: iso-8859-1, *, utf-8" simply tell the server, that the
client could receive any charset, including iso-8859-1 and utf-8.

>
>
> 2) Navigator seems to accept both UNICODE-1-1-UTF-8 and UTF-8 for the
> "charset" parameter in the Content-Type. It doesn't like
> "UNICODE-2-0-UTF-8" though. Does this mean that 2.0 is not supported?

All the charset Netscape support MIME charset name is either

  1. could be found in
     ftp://ftp.isi.edu/in-notes/iana/assignments/character-sets, or
  2. There are no name in the above document, but we need to support such
     charset so we create x-charset.

In the case of UNICODE-1-1-UTF-8 (You CURRENTLY cannot find it in the
document above) is it appear ONCE sometime ago in such document. For
compatability issue, we keep it for recognization reason. Although we do not
send such charset out, we recognize when we read it.

Therea are no such name called UNICODE-2-0-UTF-8 specified by any document
these day. (Maybe some draft RFC once) So we see no value to add such name
into our list. (Does some specification document a need for
UNICODE-2-0-UTF-8 ?)

> 4) The UTF-8 RFC says "UTF-8" is the "charset" for the MIME, but I see
> "UNICODE-1-1-UTF-7" and "UNICODE-1-1-UTF-8" all over the net for the
> MIME type.

Interesting... "all over the net" ? I appreciate if you can point me to such
place. We have hard time to find any Unicode document on the net.

> Are they both ok? Which is preferred? Which is depreciated?
> Does it follow that I can use "UTF-7" as the MIME type instead of
> UNICODE-1-1-UTF-7? Which is preferred? I see "UNICODE-1-1-UTF-7" often
> in <URL:news:alt.chinese.text>, so it's seems to be becoming a
> unofficial standard.

The answer is easy. You can find UNICODE-1-1-UTF-7 in the document
ftp://ftp.isi.edu/in-notes/iana/assignments/character-sets
You cannot find UTF-7 in the document
ftp://ftp.isi.edu/in-notes/iana/assignments/character-sets

So, use UNICODE-1-1-UTF-7 but not UTF-7. Why ? Ask IANA.

> While I know the answers to 2 and 3 in the case of Solaris/Win/Mac, I
> was wondering if this was the case for ALL versions of Navigator, and
> whether this was to change in the future.
> --
> Adrian Havill <URL:http://www.threeweb.ad.jp/>
> Engineering Division, System Planning & Production Section





This archive was generated by hypermail 2.1.2 : Tue Jul 10 2001 - 17:20:36 EDT