Re: Multiple script Handling (kanji - kana)

From: Berthold Frommann (charanchan2001@yahoo.co.jp)
Date: Fri Jan 25 2002 - 04:03:05 EST

Previous message: David Hopwood: "Re: Unicode 3.2: BETA files updated"
In reply to: Rajat Bawa: "Multiple script Handling"
Next in thread: Marco Cimarosti: "RE: Multiple script Handling (kanji - kana)"
Reply: Marco Cimarosti: "RE: Multiple script Handling (kanji - kana)"
Reply: $B$m!;!;!;!;(B $B$m!;!;!;(B: "RE: Multiple script Handling (kanji - kana)"
Reply: Anil Joshi: "RE: Multiple script Handling (kanji - kana)"
Reply: Marco Cimarosti: "RE: Multiple script Handling (kanji - kana)"
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ] [ attachment ]
Mail actions: [ respond to this message ] [ mail a new topic ]

Hi Rajat,

> Any solutions to handle the same ( or in other words to compare 2 Japanese
> strings written in different scripts or by mixture of two scripts) ??
It is definitely a non-trivial task.
If you just want to transform a katakana string into hiragana (or vice
versa), it is very easy. But as soon as you start dealing with kanji, it
gets really, really tricky.

Whereas Chinese - mostly - only has one reading per character, kanji very
often have multiple readings. If you want to compare a kanji string to a
hiragana string, you have to find out the reading of the kanji - and there
is no 1-to-1 table for doing this, rather 1-to-n.
You would need a dictionary of Japanese to determine the reading of a
compound. But some compounds have various readings, depending on the
context. So you would also need a semantic analysis of the sentence!

生物 = 1. seibutsu, 2. namamono
今日 = 1. konnichi, 2. kyou
上手 = 1. jouzu, 2. uwate, 3. kamite
下 = 1. ka, 2. ge, 3. shita, 4. shimo, 5. moto, 6. sa(...), 7. kuda(...), 8.
o(riru)

Readings of place names and personal names are especially difficult to
figure out.
However, it really depends on what kind of data you are about to process. If
you e.g. have two fields for a Japanese person's name, one in kanji, and the
transcription in kana, you could at least check whether it is among the
correct transcriptions of the name ... (sigh!)

Regards,
Berthold

__________________________________________________
Do You Yahoo!?
Yahoo! BB is Broadband by Yahoo! http://bb.yahoo.co.jp/

Previous message: David Hopwood: "Re: Unicode 3.2: BETA files updated"
In reply to: Rajat Bawa: "Multiple script Handling"
Next in thread: Marco Cimarosti: "RE: Multiple script Handling (kanji - kana)"
Reply: Marco Cimarosti: "RE: Multiple script Handling (kanji - kana)"
Reply: $B$m!;!;!;!;(B $B$m!;!;!;(B: "RE: Multiple script Handling (kanji - kana)"
Reply: Anil Joshi: "RE: Multiple script Handling (kanji - kana)"
Reply: Marco Cimarosti: "RE: Multiple script Handling (kanji - kana)"
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ] [ attachment ]
Mail actions: [ respond to this message ] [ mail a new topic ]

This archive was generated by hypermail 2.1.2 : Fri Jan 25 2002 - 04:13:36 EST