Re: Chinese variants

From: Martin Heijdra (mheijdra@princeton.edu)
Date: Tue May 28 2002 - 09:21:00 EDT

Previous message: Marco Cimarosti: "[OT] Agreement and i18n (was RE: Language name questions)"
In reply to: John H. Jenkins: "Re: N2476 a hoax?"
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ] [ attachment ]
Mail actions: [ respond to this message ] [ mail a new topic ]

Re the following, just FYI:

>One of the reasons why the whole problem of Han variants is so nasty is
>that there are so many different kinds of variant out there. In order to
>try to bring order to this chaos, we need a model and we need data, and
>the IRG is the best organization to provide that model and those data.
>
>I should point out that at the last IRG, not only did Unicode have a paper
>on variants, but the rapporteur also made a presentation on why this is a
>problem, and much of the work at the meeting was done using a variant
>database provided by Taiwan. The HKSAR also has a similar database. And,
>of course, almost any Han dictionary has variant data in it, including in
>many Chinese dictionaries TC/SC equivalence.
>
>That Han variants exist is not an issue.

A huge database used by scholars, the 800 million character electronic
version of the Siku Quanshu, has as non-exclusive classes of character
equivalents to be chosen by users, the following:

yiti (traditional variants), tongjia (different characters sometimes used
interchangeably), jianfan (simplified/traditional), zhengwu
(correct/mistaken), Zhong-Ri (Chinese/Japanese), xinjiu (new/old), gujin
(ancient/recent), xingjin (close in shape).

This while the database itself is simpler, i.e., all in traditional
characters of one form or another, than e.g. a library database would be
(where noone could predict in what kind of characters a particular title on
Ming history would be written, and one would always expect to get both as a
result of a search).

Unfortunately, no documentation is provided as to which particular pairs of
characters these equivalent classes refer to, although when doing a specific
search, a user will be alerted to which characters are added as equivalents
under a particular choice.

Martin Heijdra

Previous message: Marco Cimarosti: "[OT] Agreement and i18n (was RE: Language name questions)"
In reply to: John H. Jenkins: "Re: N2476 a hoax?"
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ] [ attachment ]
Mail actions: [ respond to this message ] [ mail a new topic ]

This archive was generated by hypermail 2.1.2 : Tue May 28 2002 - 07:39:59 EDT