Re: UTF-8: Michael takes the plunge

From: Deborah Goldsmith (goldsmith@apple.com)
Date: Mon Apr 05 1999 - 19:35:01 EDT


Hi,

> CyberStudio is not in the Info-Mac archives. And I fear that the Text
> Encoding Converter, because it is not configurable by even the expert user
> like myself, will only support WorldScript code sets released by Apple. As
> it happens, however, I have developed some of these myself. Some time ago I
> talked with some folks at Apple about this and I think it was Peter Edberg
> who said that unless you were him, you couldn't do much with the TECs....

If it has to be freeware or shareware then the choice of applications may be
even more limited.

TEC does deal with much more than Apple's character sets, but even if it
handled MacIrishGaelic, Netscape and IE might not support it. TEC supports
Arabic and Hebrew, for example, but neither Netscape nor IE handle that on
the Mac, either from 8859 encodings or UTF-8. The way that IE handles
drawing UTF-8 is by using TEC to convert to WorldScript and draw using
QuickDraw; Netscape 4.5 does not use TEC and is limited to its own built-in
encoding tables (it, too, uses QuickDraw to draw text and is thus limited to
the WorldScript repertoire). I don't know if IE will consider every
installed character set when converting.

We've had a full Unicode rendering engine in Mac OS since Mac OS 8.5 (with
full Unicode input support as well), but applications haven't been revised
to use that yet. That will come in time.

There is documentation on how to extend TEC at:
http://developer.apple.com/techpubs/mac/TextEncodingCMgr/TECRefBook-139.html
#HEADING139-0

but this only applies to the "high-level" converter, not the low-level
Unicode converter. There is currently no documentation on how to create
tables for the low-level converter; I think it would be a Good Thing to
have, but I don't know when we will be able to produce it.

When I was desperate for a Unicode text editor a few years ago, I wrote a
little Java application that used TextArea to put up an editing window. It
will read and write UTF-16 files. I'd be happy to send it to you if you
like. Apple's Macintosh Runtime for Java (MRJ) does use every installed
encoding when converting, so if you had a MacIrishGaelic table in TEC it
would work. However, that puts us right back in the same situation of
needing to create one of those. And it still might not display in IE
(definitely not in Netscape).

Another trick I tried is using Outlook Express for Mac to create UTF-8. I
was able to create a UTF-8 e-mail, but when I saved it as a text file it
reverted to WorldScript. You could try editing it, mailing it to yourself,
saying "View Source" in OE, then saving *that* as a text file. That might
preserve the binary form of the text. It still doesn't solve the Irish
Gaelic problem as OE uses TEC.

In summary, while applications are starting to have support for UTF-8, I'm
not aware of any downloadable freeware or shareware applications that
support UTF-8 text files. And having to support Irish Gaelic is a problem
due to lack of support in TEC. All of this would work great if applications
supported our Unicode rendering engine (ATSUI), but such applications aren't
available yet.

I wish I had a better answer for you. I'm sorry...

--
Deborah Goldsmith
Manager, International Toolbox Group
Apple Computer, Inc.
goldsmith@apple.com



This archive was generated by hypermail 2.1.2 : Tue Jul 10 2001 - 17:20:45 EDT