Re: More rambling about Han

From: Joel Rees (rees@server.mediafusion.co.jp)
Date: Thu Feb 22 2001 - 00:01:23 EST


Hi Thomas,

I am just a newby making noise and otherwise being obnoxious. I had
forgotten to cc the intermediate message to the mailing list, and didn't
realize it until after I posted my reply with most of Ken Whistler's reply
clipped. I'll waste even more bandwidth and paste the intermediates below.

Of course the missing third person is probably not really Ken. Ever since I
first heard of UNICODE I've been critical of it, and I have been arguing
with _myself_ about UNICODE for at least fifteen years. Dropping out of
college to try to write a FORTH native compiler with some special OO
features made it kind of difficult for me to properly join the discussion
then.

The thing that surprises me (although it shouldn't) is that UNICODE has
evolved to address quite a few of the problems that I original found with
it. I think I have three primary complaints remaining --

The common character encoding set is way too fat. It tries to deal with too
much on a global level. My feeling is that the unification should have been
kept to the barest minimum, only the basic elements of each non-ideographic
set, with just the radicals and just barely enough Han to write about the
weather in Chinese and Japanese. Not poetry about the weather, of course.

Looking at the fatness from the reverse side, there is no provision for
local wierdness. The fatness is too late to mess with, I suppose, but I
really want to encourage the use of the (currently disallowed) code space
beyond plane 17 to register global versions of local codes. For instance,
Mojikyo could get four planes maybe starting at 0x290000. JIS could get
three planes somewhere out there, each of the Chinese standards groups could
get however many planes they need out there. NEC, IBM, anyone with a variant
they want to register could get a semi-private plane even further out. All
the eight bit standards could get planes or partial planes out there.

(Surrogate quads?)

Everyone could do what they want out there, but they would be responsible
for publishing whatever needed to be published: representative fonts,
mapping tables for transforming to other sets (including, of course, the
common set), special rendering rules, compositing rules (which the IDC are
not), collation rules, searching rules (which the IDC are part of),
whatever.

This implies the need for a standard to express representative fonts and
rendering rules. It doesn't need to be fast, just needs to allow each
standards committee to communicate their standards and exception rules with
the others. Such a standard was not possible fifteen years ago, but I think
it could be done now.

Hopefully, we could use such a standard language for communicating about the
local standards as a vehicle for transmitting default character shapes for
exceptional characters that are not found in the UNICODE proper.

Joel Rees, Media Fusion, KK
Amagasaki, Japan

----- Original Message -----
From: "Thomas Chan" <thomas@atlas.datexx.com>
To: "Joel Rees" <rees@server.mediafusion.co.jp>
Sent: Wednesday, February 21, 2001 5:22 PM
Subject: Re: More rambling about Han

> On Tue, 20 Feb 2001, Joel Rees wrote:
>
> > The reason I fudge to March/April is that it doesn't do us any good if
we
> > can't see what's in the Han section of extension B right now. (I just
> > checked this morning and the charts for the Han do not yet seem to be
> > available on the site.)
> > > I dug it out of my own email archives, and append it below. I added
> > > Thomas Chan's second email address. You can take it up with him.
>
> Hi Joel,
>
> Less-than-final information about CJK Extension B has been available for a
> while, e.g., the chart at http://www.cse.cuhk.edu.hk/~irg/CJK_B.pdf (Nov
> 1999) from the IRG website (http://www.cse.cuhk.edu.hk/~irg/)--a good
> source of information on upcoming things (e.g., CJK Extension C), although
> one has to sift through the disorganized arrangement, multiple versions
> and revisions, and poor English.
>
> The most recent publically available information I know of is at
> http://anubis.dkuug.dk/JTC1/SC2/open/02n3442list.htm (May 2000), which
> includes charts and a mapping table. The latter should be rather similar
> to the final product at the end of March--I've seen a final draft version
> of an ISO document dated Jan 2001.
>
>
> I'm curious who you were conversing with and what it was about, as it
> seems a private discussion continued onto the Unicode list, and someone
> mentioned me by name. (I don't have any claims or involvement, other than
> being yet another individual interested in Unicode.)
>
>
> Your discussion of "radicals" is interesting, and although there are the
> IDC characters, I don't find them sufficient for all tasks either, e.g.,
> there is no means for describing transformations (reflection, rotation,
> etc) or deleting strokes; and the rules for how to describe characters are
> rather vague--how would one, say, describe a character composed of one
> character repeated four times in a 2x2 matrix, two rows of a pair or two
> columns of a pair? Of course, the IDCs are taken from GB 13000.1 ,
> presumably directly (I haven't seen the GB 13000.1 standard).
>
>
> Thomas Chan
> tc31@cornell.edu
>
On 2001.02.21 11:58, I wrote:

Thanks for the information, Ken, especially the rollout deadline date of
March 31st. I can now tell my cohorts that UNICODE will have officially
added 40K characters by sometime in March or April, and I think they will be
(mostly) happy to hear that.

I look forward to seeing tables of the forms on line.

You mention a critique of Mojikyo, I searched the archive at

http://www.egroups.com/messages/unicode/

for it, but did not find it. (Checked under moji and mozi.) Can you give me
a link or something?

Incidentally, I would encourage an attitude of trying to understand what the
Japanese are trying to do with the Mojikyo and Tron character
specifications. There are some hidden issues with Kanji that they themselves
aren't properly able to explain, even between themselves.

I truly wish I had been able to get involved in this during the birth of
UNICODE. My personal feelings are that the uniform, comprehensive standard
as it presently stands was far too ambitious a goal for an international
standard. There is a lot of excellent research being done because UNICODE
exists, but ultimately the people who know most about how to visually
represent a language on a computer are the people that use that language on
a daily basis. Rather than a universal international set, I would have
preferred a smaller common international set. My fingers are moving randomly
again.

If you could point me to the critique you mentioned, I would like to see it.

Joel Rees, Media Fusion KK
Amagasaki, Japan

On 2001.02.21 13:30 Ken wrote:

Joel,

>
> Thanks for the information, Ken, especially the rollout deadline date of
> March 31st. I can now tell my cohorts that UNICODE will have officially
> added 40K characters by sometime in March or April, and I think they will
be
> (mostly) happy to hear that.

In March. Technically, they are all already approved. And we will not
slip the March 31 absolute deadline for the formal rollout of
Unicode 3.1.

> You mention a critique of Mojikyo, I searched the archive at
>
> http://www.egroups.com/messages/unicode/
>
> for it, but did not find it. (Checked under moji and mozi.) Can you give
me
> a link or something?

It was reasonably recent. The note I had in mind was something from
Thomas Chan on January 29 in a thread entitled "Benefits of Unicode"
that wondered way off topic. It might not have made it into mirrored
message archives yet.

I dug it out of my own email archives, and append it below. I added
Thomas Chan's second email address. You can take it up with him.

>
> Incidentally, I would encourage an attitude of trying to understand what
the
> Japanese are trying to do with the Mojikyo and Tron character
> specifications. There are some hidden issues with Kanji that they
themselves
> aren't properly able to explain, even between themselves.

This is not too surprising. Everyone gets tangled up in their own
writing systems when they start to wander into traditionally freighted
topics like linguistic politics, orthography reforms, historical
correctness, educational prescription, and a host of other cultural
issues.

Not reacting to Mojikyo or Tron specifically, I have a sense that
the Japanese in general are very caught up in the concept of Japanese-ness,
and not surprisingly, it gets all tied up with their writing system
as a visible symbol of their difference and cohesiveness in the context
of interacting with the dominantly Latin "West". There is also a
tendency to carry a kind of a chip on their collective shoulders about
the origins of their culture -- what part is truly autochthonous (Amaterasu
and all that) and what part is just the wave after wave of Chinese
cultural influence, but reinvented and Japanicized locally. Of course,
the writing system is a particular conundrum, because everybody knows
it came from China, but they really, really want to believe it is
authentically Japanese.

"Hidden issues with Kanji" tend, in my opinion, to verge on mystical
essences of characters -- cultural freighting of connotations that
are supposedly nonconveyable. But I view this as the same kind of
silliness that we can find in our culture when people get all tied up
with the mystery of the word -- like Biblical literalists tied to
text that was badly translated in the 17th centuray from bad Latin
translations of Greek and Aramaic.

In this context I find the Chinese point of view refreshingly unconvoluted.
Unlike the literally insular Japanese, the Chinese have been a cosmopolitan
culture for millennia, and they have the easygoing self-confidence that
comes from being the cultural center -- the inventers of the writing system,
along with most of the rest. Sure, the Chinese have their own character
mystics, but none of that seems to bubble up to the level of interfering
with what to them is the rather straightforward process of cataloging,
comparing, and encoding all the characters. I don't get the sense with
them that there is some cultural "ISSUE" here with them, and they don't
have much of a problem with the interested honkies pitching in and
helping with the work.

>
> I truly wish I had been able to get involved in this during the birth of
> UNICODE. My personal feelings are that the uniform, comprehensive standard
> as it presently stands was far too ambitious a goal for an international
> standard.

I disagree with you here.

The Unicode Standard is *not* intended to put historians of Han characters
out of business. It is not the ultimate, final catalog. It does not attempt
to resolve all the scholastic questions that will continue to be of
interest. Heck, Richard S. Cook recently wrote a 250 page monograph on
The Etymology of Chinese Chen2 (the scorpion character). He lists 208 Oracle
bone exempla and 35 bronze exempla, and tracks the whole set of related
forms through Shuowen and other documents.

But for global information interchange on computers, *somebody* had
to put a stake in the ground for Han characters. The alternative was
a dozen different stakes being moved by different committees from
different points of view and in different directions. It already was
chaotic, and the needs of the Internet are slowly pushing that kind
of chaos aside, in favor of (relatively) simple, interoperable standards.

> There is a lot of excellent research being done because UNICODE
> exists, but ultimately the people who know most about how to visually
> represent a language on a computer are the people that use that language
on
> a daily basis.

I have no issue with that, except with the implication that that is
a character *encoding* issue. It is not. It is a *font* issue --
particularly for East Asia, where the writing systems are actually
rather simple from a rendering point of view (though complex on a
glyph-per-glyph basis).

In Japan, you just set the best typographers to the task of creating
authentic Japanese looks and styles for the characters (the same thing
they have been doing in lead type for ages), and that works just fine
with Unicode.

> Rather than a universal international set, I would have
> preferred a smaller common international set.

And then what? If you use some local coding extension and want to put it
up on the Internet because I or someone else wants to use it somewhere
else in the world, what happens? How do you interoperate? That's just
another way of carrying forward with hundreds of code pages, as we did
in the 20th century. *hehe* It doesn't scale for global information
systems.

> My fingers are moving randomly
> again.
>
> If you could point me to the critique you mentioned, I would like to see
it.

Here it is...

--Ken

> From unicode@unicode.org Mon Jan 29 17:44 PST 2001
X-UML-Sequence: 17891 (2001-01-30 01:07:30 GMT)
To: "Unicode List" <unicode@unicode.org>
Date: Mon, 29 Jan 2001 17:07:28 -0800 (GMT-0800)
Subject: Re: Benefits of Unicode

On Mon, 29 Jan 2001, David Starner wrote:

> On Mon, Jan 29, 2001 at 01:06:44PM -0800, Alistair Vining wrote:
> > Somebody mentioned TRON, which I'd not heard of before, but
> > <http://tronweb.super-nova.co.jp/characcodehist.html> says:
>
> The link on the Tron webpage (www.tron.org) to the English information
> about the Tron character set is broken. But as other links indicate
> and I remember, TRON imported Unicode 2.0 for all of TRON's non-CJK
> characters. So they imported many of Unicode's minor problems, and
> fall increasingly short of Unicode's support for the world's languages.
> (I wonder if there's even been an attempt to rewrite all the information
> needed to properly implement those characters. . .)
> Does anyone else have any information about the Tron character set?

Go to the http://tronweb.super-nova.co.jp/ website, and there's a lot
of information scattered among various press releases and galleries. For
example, http://tronweb.super-nova.co.jp/b-right-vr2-5gallery.html has
pretty pictures of an editor with vertical text capability and a
utility program to help find characters among the big pool of z-variants
or font-level variants.

Or look at http://tronweb.super-nova.co.jp/tronnews99-9.html , which shows
how TRON basically swallowed up (without unification) various character
sets when they don't have the expertise in-house, and this is not limited
to non-CJK writing systems. For example, Traditional Chinese needs are
addressed by swallowing up Big5 or CNS 11643-1986 (they are not very
clear which), which is pathetic since CNS 11643-1992 was available at the
time.

Meanwhile, Mojikyo, a collection of nice TrueType fonts
(http://www.mojikyo.gr.jp/) that is essentially the repetoire of the
decades-old _Dai Kanwa Jiten_ (Morohashi) plus additions, is swallowed up
in an attempt to up the number of characters, but the expertise in the
Mojikyo project is primarily Japanese-only, and they miss many characters
that one would find from Chinese sources such as the mid-1980's _Hanyu Da
Zidian_ (going into Unicode 3.1), or even various national standards in
East Asia newer than the 1980's ones they are familiar with. Other
inclusions inherited from Mojikyo like the Jiagu (shell and bone)
proto-Chinese script or the Shuiwen (mistranslated as "water writing") are
probably pre-mature, while inclusion of scripts like Siddham (listed as
"bonji" or "Brahma" characters)--used for religious purposes in Japan--and
the lack of coverage of others is rather interesting for it says about its
universality and the resources and expertise at their disposal.

Thomas Chan
tc31@cornell.edu

thomas@atlas.datexx.com



This archive was generated by hypermail 2.1.2 : Tue Jul 10 2001 - 17:21:19 EDT