Re: Unicode under fire again

From: DougEwell2@cs.com
Date: Wed Jun 06 2001 - 02:27:07 EDT


> http://www.hastingsresearch.com/net/04-unicode-limitations.shtml

I decided to be courteous this time and let others burn this article to a
crisp before stepping in to blow away the ashes.

There's something rewarding about reading an anti-Unicode article that
starts, in the first paragraph, by saying that Unicode is "a 16-bit character
definition allowing a theoretical total of over 65,000 characters." That
tells me right away how much accuracy to expect in the rest of the article.
(I first read about surrogates around 1993.)

The paper promises to discuss "the political turmoil and technical
incompatibilities that are beginning to manifest themselves on the Internet"
because of the supposed inadequacy of Unicode, but no evidence is ever shown
that such turmoil and incompatibilities actually exist. We are simply asked
to assume that they exist because of the supposed inadequacy. This is a
circular argument.

We learn that "Unicode's stated purpose is to allow a formalized font system
to be generated." Font system? Did you know that?

John Cowan has already discussed the scurrilous claim that no Chinese,
Taiwanese, Koreans, or Japanese were consulted on the design of Unicode 1.0.

The claim of 170,000 total characters is based on disunification of all Han
characters. This is simply a disagreement in design between the Unicode
approach and Goundry. The decision to unify a traditional Han character
written with different typographical conventions, instead of coding three
separate versions for Chinese, Japanese, and Korean, does not mean that two
characters are somehow missing.

Furthermore, even if 170,000 discrete characters are necessary -- they may
be, for all I know -- the premise that the Han repertoire is not completely
specified is presented as evidence not that Unicode is a work in progress,
but that it willfully ignores the needs of speakers of East Asian languages.
This is not merely inaccurate, it is irresponsible.

Unicode 3.1 is dismissed with a handwave, on the basis that "two separate
16-bit blocks do not solve the problem [of inadequate repertoire] at all."
No technical (or other) justification is attempted to explain why all Han
characters must appear in a single contiguous block.

The "analogy" of deleting 25 percent of the Latin alphabet or 75 percent of
the English vocabulary is completely pointless. By Goundry's own admission,
the Chinese writing system is in constant flux, whereas the Latin alphabet
(and others) has been fixed for centuries. And the remarks about the French
being forced to use "the German alphabet" or the English using "a French
alphabet" left me laughing. The Latin alphabet, designed for the Latin
language, was intentionally borrowed by the English, the French, the Germans,
and speakers of dozens of other languages.

The passage about Verisign is completely irrelevant to the rest of the
article, to the point where I wondered if it had been pasted in by accident.

After all this, the question left for me to ponder was this: If Unicode does
not solve the problem of adequately encoding Han characters, then what
character set does? EUC-JP? Big 5? GB2312? Finally another list member
mentioned the (grammatically sound) reference, "Hastings been experimenting
with workarounds." Somehow I am not left shuddering with fear at the
impending demise of Unicode.

-Doug Ewell
 Fullerton, California



This archive was generated by hypermail 2.1.2 : Fri Jul 06 2001 - 00:17:18 EDT