Re: Multiple script Handling (kanji - kana)

From: Berthold Frommann (charanchan2001@yahoo.co.jp)
Date: Fri Jan 25 2002 - 10:52:17 EST


Dear Prof. Genenz,

> From my database with roughly 50.000 lexical
> entries (compounds) I get a
> number of 1431 compounds with
> at least two readings and 71 with at least three
> readings.
Taking only into account compounds with multiple readings.

But imagine this: If a program had merely access to a database containing
the readings of single characters, it still couldn't figure out the reading
of a compound (reliably). How could it "know" that $B?M4V(J is "ningen" and not,
for instance, *jinkan?
This means that it would be vital to have access to a database containing
entries for kanji-compound->reading(s).

Without semantic analysis of the sentences concerned, it is not possible to
determine the correct contextual reading of every Japanese compound.
It is only possible to check whether the reading given in the second string
is _one_ of the correct readings.

Greetings from Edo,
   Berthold Frommann

__________________________________________________
Do You Yahoo!?
Yahoo! BB is Broadband by Yahoo! http://bb.yahoo.co.jp/



This archive was generated by hypermail 2.1.2 : Fri Jan 25 2002 - 10:43:31 EST