Manyoushuu and Unicode, (was Re: Synthetic scripts, etc)

From: Rick McGowan (rick@unicode.org)
Date: Fri Mar 15 2002 - 15:52:07 EST


As long as we're on the subject of Han... dankogai@dan.co.jp wrote:

> Well, I can even give up on classical writings (after all my knowledge
> on classical writings, East or West, is too limited to discuss in
> depth). But it strikes me to face the fact that some of you can't even
> spell your name in Unicode.
> ...

Well, we know about a lot of scripts that aren't encoded yet... so that's
a no-brainer.

> For instance, there are at least 31 (official) way to spell 'Wata' of
Watanabe,

... but presuming that you are talking only about Han characters, who on
this list, in a Han-using country, cannot spell their names with Unicode? I
keep hearing such assertions, but they have never been proven. I have in
the past tried to get people to come forward with proof of any such claim.
So far, no luck. Until I see at least one example, it's hard to believe the
assertion.

Moving right along, I find it really hard to believe that the Manyoushuu
can't be encoded with Unicode 3.2, if not an earlier version of Unicode. Is
there anyone on this list who can prove it one way or another?

Here is a web stie on the Manyoushuu:

        http://etext.lib.virginia.edu/japanese/manyoshu/

It includes the full text, here:

        http://etext.lib.virginia.edu/japanese/manyoshu/AnoMany.html

It also has a really nice page on the "Unavailable Kanji in the E-Text",
so most of the detective work is done already!

Can anyone with a bit of Kanji knowledge check this page:

        http://etext.lib.virginia.edu/japanese/manyoshu/AnoMany.unavailable.html

and tell whether all of the "missing" Kanji are in Unicode 3.2? A bunch
of them are shown as not available in the Dai Kanwa, but I'm under the
impression that the Dai Kanwa is covered by Unicode these days; so maybe
someone would only have to check the items marked "nashi" for the Dai Kanwa
column in that table.

In any case, the Manyoushuu can certainly be expressed in Unicode 3.2 with
ideographic description sequences -- and the above mentioned page does
essentially that: it describes all of the Kanji that are "missing" from the
encoding (listed as "x-euc-jp" in the web page; presumably being JIS X
0208 compatible). In my casual perusal I don't see anything that looks
unexpressible with IDS.

        Rick



This archive was generated by hypermail 2.1.2 : Fri Mar 15 2002 - 15:15:38 EST