Re: How to tell Japanese from Chinese.

From: Thomas Chan (thomas@atlas.datexx.com)
Date: Fri Jun 08 2001 - 09:51:34 EDT


On Fri, 8 Jun 2001, [ISO-2022-JP] てんどう瘢雹りゅう瘢雹じ wrote:

> My very simple rule of thumb for telling Japanese from Chinese is to
> look for kana. If I see even one kana, I am looking at Japanese,
> right? (Warning: A few kanji resemble katakana.) So if I see so much
> as a hiragana "to", it's Japanese, right? But sometimes there are
> stretches of many kanji.

Yes, that rule of thumb works for most everyday cases that one'll run
into.

However, manyougana would be classified as "Chinese" under that rule, as
well as kanbun. I'm not sure that one would want to classify the more
"deviant" (from a classical Chinese POV) and more Japanized forms of
kanbun as "Chinese".

Have you seen hentaigana before?--that straddles the boundary between
being kanji used for transliteration/transcription and being kana. (How
would such text be encoded in Unicode, if at all?)

 
> Doesn't this kanji
> <bad-ascii-art>
[snip]
> </bad ascii art> NOT to be confused with hiragana "e" (oy vey),
> usually only appear in Chinese?

In other words, \u4e4b vs. \u3048.

I presume you're asking for purposes of a human reader, as a machine could
easily detect the difference between U+4E4B and U+3048. But then, a
sufficiently literate human could probably read the text and eliminate one
of the two choices.

Marco has already explained U+4E4B.

But if you want a simple system based only on presence (or lack) of
certain characters, then I'd look for common Chinese ones such as:

  \u9019 (\u8fd9) zhe 'this'
  \u5011 (\u4eec) men (plural suffix)
  \u9ebc (\u4e48) me (as in \u751a\u9ebc, \u751a\u4e48, \u4ec0\u4e48
     shenme 'what')
  \u55ce (\u5417) ma (question particle)
  \u4f60 ni 'you'

 
> Pardon my incoherence. I haven't had enough sake.

\u9152 on a sign or label--is that Chinese or Japanese? Hard to tell.

If you're familiar with some differences in simplification, you can also
make corroborating conclusions, e.g., if I see \u6226 embroidered
on someone's baseball cap, rather than \u6230 or \u6218, then that
strikes me as "Japanese". (Not that the person who made it or wearing it
probably knows or cares.)

Thomas Chan
tc31@cornell.edu



This archive was generated by hypermail 2.1.2 : Fri Jul 06 2001 - 00:17:18 EDT