Re: Unihan

From: Benjamin C.Kite (dharbigt@pobox.com)
Date: Tue Apr 12 2005 - 10:43:04 CST

Next message: John H. Jenkins: "Re: Unihan"

Previous message: Eric Muller: "Re: String name and Character Name"
In reply to: John H. Jenkins: "Re: Unihan"
Next in thread: John H. Jenkins: "Re: Unihan"
Reply: John H. Jenkins: "Re: Unihan"
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ] [ attachment ]
Mail actions: [ respond to this message ] [ mail a new topic ]

>>
>> Is this the appropriate forum to discuss the Unihan database, or is
>> there another list for that?
>>
>
> General questions regarding the database are appropriate to raise
> here. There is another list for people who are interested in actively
> working on improving it. As a general rule, you can ask your question
> here and it can be shunted to the other list if it seems more
> appropriate there.

I run across a few errors or omissions per week. If there is interest
in my input, I'd be happy to offer it to the appropriate parties.

Aside from this I have a few questions:

I am curious if Unihan is making private modifications to the
definitions, separate from CEDICT, or whether Unihan relies solely on
input from CEDICT for its definitions database.

Secondly, I notice that the definitions assigned to traditional
characters aren't always appended to the definitions of the simplified
characters, most especially when the simplified version has its own
meaning in the traditional set. It seems trivial to append that
information with one more database query. However, I'm curious if
there was an extended discussion about whether semantic variants should
hold the same definitions as their standard counterparts. There are
certainly numerous cases when a semantic variant has no definition data
where its standard counterpart does. Should duplicate definitions be
propagated here?

I also notice that there are notations in the definition fields that
refer to other characters in three different ways: U+FFFF, VEAFFFF, and
also by including the character itself. Does this fall into the
demesne of the Unihan group, or is this also CEDICT?

Lastly— for the moment— I'm curious whether there is any future plan to
include Wubi Hua or ITABC stroke input data to this database. It would
seem to be a fairly simple set of data to include, and would make the
database more useful, even if only a limited number of characters were
included.

Next message: John H. Jenkins: "Re: Unihan"
Previous message: Eric Muller: "Re: String name and Character Name"
In reply to: John H. Jenkins: "Re: Unihan"
Next in thread: John H. Jenkins: "Re: Unihan"
Reply: John H. Jenkins: "Re: Unihan"
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ] [ attachment ]
Mail actions: [ respond to this message ] [ mail a new topic ]

This archive was generated by hypermail 2.1.5 : Tue Apr 12 2005 - 10:43:59 CST