Re: Multiple script Handling (kanji - kana)

From: Berthold Frommann (charanchan2001@yahoo.co.jp)
Date: Fri Jan 25 2002 - 04:03:05 EST


Hi Rajat,
 
> Any solutions to handle the same ( or in other words to compare 2 Japanese
> strings written in different scripts or by mixture of two scripts) ??
It is definitely a non-trivial task.
If you just want to transform a katakana string into hiragana (or vice
versa), it is very easy. But as soon as you start dealing with kanji, it
gets really, really tricky.

Whereas Chinese - mostly - only has one reading per character, kanji very
often have multiple readings. If you want to compare a kanji string to a
hiragana string, you have to find out the reading of the kanji - and there
is no 1-to-1 table for doing this, rather 1-to-n.
You would need a dictionary of Japanese to determine the reading of a
compound. But some compounds have various readings, depending on the
context. So you would also need a semantic analysis of the sentence!

生物 = 1. seibutsu, 2. namamono
今日 = 1. konnichi, 2. kyou
上手 = 1. jouzu, 2. uwate, 3. kamite
下 = 1. ka, 2. ge, 3. shita, 4. shimo, 5. moto, 6. sa(...), 7. kuda(...), 8.
o(riru)

Readings of place names and personal names are especially difficult to
figure out.
However, it really depends on what kind of data you are about to process. If
you e.g. have two fields for a Japanese person's name, one in kanji, and the
transcription in kana, you could at least check whether it is among the
correct transcriptions of the name ... (sigh!)

Regards,
   Berthold

__________________________________________________
Do You Yahoo!?
Yahoo! BB is Broadband by Yahoo! http://bb.yahoo.co.jp/



This archive was generated by hypermail 2.1.2 : Fri Jan 25 2002 - 04:13:36 EST