Re: UTF-8 and browsers

From: David Goldsmith (goldsmith@apple.com)
Date: Mon Nov 02 1998 - 15:33:01 EST


Jungshik Shin (jshin@pantheon.yale.edu) wrote:

> That's what I guessed when I replied to his query. However, to make it
>clear, I'd like to ask you some more questions. (especially, I heard
>from a Korean user that he can't view a UTF-8 encoded web page (mostly
>of Korean precomposed Hangul syllables) at
>
> http://pantheon.yale.edu/~jshin/faq/utf8_kr.html
>
>with his Netscape 4.0x under English MacOS 8.x plus Korean Lang. Kit.
>By supporting UTF-8, you meant both Netscape and MS IE can display web
>pages encoded in UTF-8 with wide variety of characters drawn from UCS-2
>(i.e. NOT just characters belonging to a single ISO-8859-x or one of CJK
>BUT text made up of characters from multiple ISO-8859-x, CJK, and other
>character sets) as long as (a) font(s) to render those characters are
>available. Unix/X11 version of Netscape "dynamically" make a
>'Pseudo-Unicode' font collecting glyphs from all the fonts available on
>the system. On the other hand, MS-Windows version of Netscape requires
>a single UCS-2 encoded font (such as Cyberbit from Bitstream). It'd
>be nice to know which is the case under MacOS. Thank you,

I looked at this page using MS Internet Explorer 4.01 and Mac OS 8.5,
with Multilingual Internet Access installed.

When I first brought the page up, the introductory Korean was displayed
using a Roman font, which appears to be a bug in Internet Explorer. The
list of precomposed syllables was displayed about 4/5 question marks,
which I expected since the Mac Korean fonts only contain the syllables
used in our MacKorean character sets, not the full precomposed set in
Unicode. This is because Internet Explorer (and Netscape, as well)
convert Unicode (UTF-8 included) to runs of text in the Mac OS' native
character sets for display (using WorldScript).

I am pretty sure that Internet Explorer is capable of displaying UTF-8
containing multiple scripts, as I've seen such pages displayed before.

So, there are three issues for Unicode display in Mac browsers:

1. WorldScript (I and II) languages do not cover all of Unicode. Any
browser (currently all of them) which renders its text through
WorldScript will not be able to access all of Unicode. However, a large
portion of the BMP is covered.

2. Some browsers may not even support all of the languages covered by
WorldScript. Neither Internet Explorer nor Netscape Navigator support
WorldScript I writing systems: Arabic, Hebrew, Devanagari, Gurmukhi,
Gujarati. Apple can't do anything about this; contact your browser vendor.

3. The only path to displaying all of Unicode is for browsers to adopt
ATSUI, Apple's new Unicode imaging API (new with Mac OS 8.5). Again, it
is up to the browser vendors to support this; contact them if you would
like them to do that.

I hope this clears things up...

David Goldsmith
Architect, International & Text Group
Apple Computer, Inc.
goldsmith@apple.com



This archive was generated by hypermail 2.1.2 : Tue Jul 10 2001 - 17:20:42 EDT