Re: UTF-8 and browsers

From: Tague Griffith (tague@netscape.com)
Date: Fri Oct 30 1998 - 16:37:14 EST


Perhaps, you might benefit from some tutorials about Unicode and character sets
in general. Bill Hall and Peter Edberg have some good survey material about
character sets. Both papers are included in the Unicode Conference proceedings
which can be ordered from the Unicode Consortium (http://www.unicode.org). I
don't know if either has an online copy that they could point you at.

The language and concepts that you are using to describe character sets don't
make sense to me or I would imagine most of the people on the Netscape i18n
support group, which might be why you aren't getting satisfactory responses.

UTF-8 is a transformation format of Unicode. It it an algorithm for converting
a sequence of 16-bit character codes into a sequence of 8-bit character codes.
UTF-7 for instance is a similar transformation only it transforms into a 7-bit
result. One of the reasons for doing this is the Endian-ness of different
commputer hardware. Different processors will interpret the sequence 0xABFF as
a number differently because of hardware design decisions. Some processors
will intepret the number as 0xFFAB (where the lower part is the most
siginifcant) others will interpret the number as ABFF (where the upper part is
more significant). for this relason, a web page that is designed to be used
with different computers will use UTF-8 instead of UCS-2 because this problem
doesn't exist in a stream of 8-bit characters.

Now, part of the problem is that you are trying to take data and convert it -
particularly by hand. The font/encoding preference that you are looking at in
Mac Navigator are preferences to associate a particular font with a particular
coding. What you are "saying" to Navigator when you set this preference is
that when I view a page that is encoded in UTF-8, please display it in this
font. I think one of the things that you might be missing is that a page uses
a particular charaset and that is set by the author, you are telling navigator
what font you would like to use for pages in that particular character set.
There is nothing you "do" as far as pencil and paper to make UTF-8 work, you
use that setting to look at pages which have been encoded in UTF-8.

If you have particular documents that are in Unicode and you want to view them
in Navigator or convert them to UTF-8, there are tools that will do this
conversion for you.

If you can read the two papers mentioned above, then ask your questions again -
I think that you might find more responses from the Unicode mailing list.

hope this helps.
/t

Trond Trosterud wrote:

> I have got two answers to my posting, and hunted the NS support team for
> answers, but no one even sees the problem of how to cope wit 16 bits for an
> OS (like mac 7.x and 8.x) that only offers its character sets 8 bits at a
> time. The series of 8-bit blocks make sense to me when seen as a means of
> packing (and unpacking) for transportation, if there is a 32-bit universe
> waiting in the other end. But now there isnīt.

>

--
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
tague griffith
mailto:tague@netscape.com
minion, internationalization
netscape communications
"more and more i'm begining to realize the world would be a better
 place if i was its supreme dictator." - me
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~




This archive was generated by hypermail 2.1.2 : Tue Jul 10 2001 - 17:20:42 EDT