From: Benjamin Peterson (ben@jbrowse.com)
Date: Tue May 11 2004 - 08:14:14 CDT
On Mon, 10 May 2004 16:29:25 -0700 (PDT), "Kenneth Whistler"
<kenw@sybase.com> said:
> Stefan Persson wrote:
>
> > Mike Ayers wrote:
> > > I have not seen
> > > katakana joined to kanji (or romaji), and suspect that such does not occur.
> >
> > There are a few cases, e.g. ソ連 (So-Ren: Soviet Union), but that could
> > also be written as two kanji as 蘁E (which is however very rare in
> > modern Japanese).
>
> It's actually quite common, depending on how you choose
> to construe "joined".
Indeed, and in addition to the examples you give there are at least three
more cases:
1 -- increasingly, people use katakane in _half_ of a jukugo (multi-kanji
word) because one of the kanji is too hard to remember. The result needs
to be treated as a single word but it is part kanji and part katakana.
2 -- in the past, the roles of hiragana and katakana were less well
defined, and many things (inflections, particles) that would be in
hiragana now were in katakana. This results in words that are kanji +
katakana suffixes.
3 -- some expressions, such as the placename 'kasumigaseki', have a
katakana in just for the heck of it.
There are also a number of common situations in which romaji are commonly
mixed with katakana or kanji. I think it is impossible to make any rigid
rules about what combination the four scripts can occur in.
Luckily, as Japanese typography allows line breaks anywhere except in the
middle of a romaji word and next to some punctuation marks, life is still
bearable. Morphological analyzers should ignore whitespace completely
and accept that a 'word' can span any combination of scripts.
Benjamin
-- Benjamin Peterson bjsp123@imap.cc
This archive was generated by hypermail 2.1.5 : Tue May 11 2004 - 08:14:48 CDT