From: Kenneth Whistler (kenw@sybase.com)
Date: Fri Jun 11 2004 - 14:20:59 CDT
> On Jun 11, 2004, at 6:44 AM, Andrew C. West wrote:
>
> > Depite the oft-mentioned cutesy Hong Kong race horse names,
> > idiosyncratic
> > invented Han ideographs are a negligible component of the encoded CJK
> > repertoire. In my opinion there are thousands, possibly tens of
> > thousands, of
> > ideographs that should not really have been encoded individually as
> > they are
> > simply minor glyph variants, frequently only attested in a single
> > source because
> > the author simply wrote the character wrongly in the first place. This
> > is the
> > real issue with the over-encoding of CJKV, not the occasional race
> > horse name.
>
> In particular, the decision to import en masse the repertoire of the
> Hanyu Da Zidian was not a wise one, as a substantial number of the
> entries are of the form "same as X".
Andrew and John have correctly identified the bulk of the problem
for CJKV overencoding.
Unfortunately, given the nature of the Han script and the
historical practice of Chinese lexicography, the result we
have ended up with is almost inevitable.
This historic mistakes, minor glyph variants, and such got
carried into scholastic compendia *as characters*, where they
become lexical headwords, repeated ad infinitum, in each
further edition and each new compendium. The fact that they
got carried into the Hanyu Da Zidian, the Chinese moral
equivalent of the Oxford English Dictionary, means that
inevitably they end up in the character encoding, as digital
representation of the Hanyu Da Zidian is absolutely required.
Leaving some out, no matter how mistaken or obsolete, would,
from the Chinese point of view be like deciding to leave
some obsolete word out of the OED simply because there
wasn't a "character" encoded for it.
It would have been nice if a better mechanism for expressing
Han glyphic (and other types of) variants had been feasible
and in place before CJK Extension B went in, but that is
water under the bridge now. One can only hope that some
restraint and use of alternative mechanisms will be shown
in the current effort to define and encode additional CJK
extensions, which involve even *less* useful characters, for
the most part, missed even by the major dictionary compendia.
--Ken
This archive was generated by hypermail 2.1.5 : Fri Jun 11 2004 - 14:52:19 CDT