Re: UTF-8 and browsers

From: Jungshik Shin (jshin@pantheon.yale.edu)
Date: Fri Oct 30 1998 - 07:39:57 EST


On Fri, 30 Oct 1998, Trond Trosterud wrote:

> >> The new Netscape 4.06 has UTF-8 and UTF-7 encodings as two of their
> >> encoding options. [...] So, what is this supposed to be?
> >> - A fanfare for the future, when the OS turn 32 bit?
> >> - A way of evoking sets of 8-bit code tables, each representing a small
> >> part of the UCS (such sets exist)
> >

> I have got two answers to my posting, and hunted the NS support team for
> answers, but no one even sees the problem of how to cope wit 16 bits for an
> OS (like mac 7.x and 8.x) that only offers its character sets 8 bits at a
> time. The series of 8-bit blocks make sense to me when seen as a means of
> packing (and unpacking) for transportation, if there is a 32-bit universe
> waiting in the other end. But now there isnīt.

> The answer may be so obvious that no one cares to give it ("yes, it is
> impossible to use utf-8 for 8-bit-systems, it was just put there for future
> use" or vice versa: "yes, there is a way of defining a set of 8-bit-tables
> corresponding to UCS eencoding to the browser"). Please give it anyway.
> Netscape does not know the answer, and so far, this list has not (shown
> that it has) understood the question.

  I'm puzzled at your repeated reference of '8bit system'. I'm not sure
what you exactly meant by that. Neither am I sure I have to answer your
question on this list(as opposed to sending a reply to you only) because
you seem to be asking too obvious a question nobody on the list to care
to answer. MacOS has been able to handle multibyte encodings(i.e. 8bit
AND 16bit) for CJK since 7.0(with WS I and WS II extensions bundled in
Mac OS 7.5 or later by default, all you need to view CJK web pages are
some free fonts. i.e. Language Kit is not necessary for just viewing web
pages) and there's absolutely no reason (as far as I know) it can't
handle UTF-8 as well although I don't know whether or not current
version of NS 4.x for MacOS supports that.

> p.s.
> I know already that the OS-es themselves are 32 bits. But I also know that
> they only give me 8 bits at a time when it comes to character-glyph sets
> (e.g. my mac 7.x and 8.x, hwere I use the NS 4.06, the exception so far
> being W-NT, but this too is all-too-well kept as a secret, glyphs are not
> provided, etc.) . Thus, only 256 at a time, today I use the UCS in my
> linguistic writing by changing between the different "fonts" (8-bit code
> tables of Everson Mono, each corresponding to a block in the BMP). Can NS
> 4.06 match that, do they have other means, or what?

   Hmm, I can't help getting the impression that you've been away from
the planet for a long time while all these things have been happening
(have you been off this list for a while?) :-). Under both Unix/X11 and
MS-Windows, UTF-8/UTF-7 support of NS 4.0x or later work pretty well(not
just 256 at a time but for the full UCS-2) as far as glyphs are
available. In MS-Windows, you need to have Unicode fonts(one of which is
freely available Cyberbit from Bitstream with over 20k glyphs including
those for Chinese ideograms and quite many Korean Hanguls) to view
UTF-8/UTF-7 encoded pages while in Unix/X11 NS manages to collect glyphs
from fonts available on the system to form a 'Pseudo-Unicode' fonts. As
you wrote, you wrote how to convert to UTF-8 from UCS-2, why don't you
make a script/program to generate a UTF-8 encoded page (with all the
characters in UCS-2) and see how it's rendered by NS 4.x for MS-Windows
and Unix/X11. Also, there are pretty many pages around the net encoded
in UTF-8 these days.

     Jungshik Shin



This archive was generated by hypermail 2.1.2 : Tue Jul 10 2001 - 17:20:42 EDT