Words, Characters and Etymology

From: Steven Brent (sbrent@ozemail.com.au)
Date: Fri Apr 28 2000 - 10:10:10 EDT


I confounded by the assertions that Unicode does not handle the glyphic
variants used in the Han
cultural areas ... or is it assertions that Unicode should handle these
variants in a better way than it does?

These are the code points and glyphs (visible if your mail program supports
UTF-8) for the character for 'sell' in its various embodiments:
   
This is the Japanese form 売 (U+58f2)
This is traditonal Chinese and Korean form 賣 (U+8ce3)
This is simplifed Chinese form 卖 (U+5356)

The Japanese glyph has its own code point; TC and Korean share a code point;
SC also has its own.

Are some people arguing that the inclusion of 3 codes does not achieve Han
unification and that a single code point should be used for all 3 and that
the appropriate national rendering should be effected by means of tags or
whatever?

My feeling is the Chinese characters represent semantic morphemes (bound and
free) and are therefore fundamentally different from letters of an alphabet
which (singly or in combination) represent (more or less) phonemes. Han
characters are thus equivalent to words.

The independent evolution of the forms of the original Chinese characters -
which have retained their original meaning - is akin to the development of
Latin words (eg. canis) into the later French (chien) or Italian (cane)
reflexes, or even their learned borrowing into languages like English (eg.
in 'canine'). Or indeed any cognates in related languages, eg. German (Hund)
and English (hound).

These are all demonstrably related forms, but if we were encoding European
codesets as whole words (like Han characters) rather than as sets of
letters, I would not for a second propose using a single code point (eg. 犬)
for 'hound' and 'Hund' or even the related 'chien', saying these represent
the same word which is rendered somewhat differently in each country! We
would certainly treat them as distinct codes, so why treat Han glyphs any
differently? Are we trying to be etymologists here?



This archive was generated by hypermail 2.1.2 : Tue Jul 10 2001 - 17:21:02 EDT