Re: Synthetic scripts (was: Re: Private Use Agreements and Unapproved Characters)

From: Dan Kogai (dankogai@dan.co.jp)
Date: Fri Mar 15 2002 - 18:03:35 EST


On Saturday, March 16, 2002, at 07:27 , Kenneth Whistler wrote:
> *What* still holds true? These are just well-worn issues of itaiji
> (variant forms). The characters from the little anime exhibit of
> variants are, in Unicode:
>
> U+9AD8 / U+9AD9
> U+5516 / U+555E
> U+9593 / U+9592
>
> all variants of the same character that got cloned into Unicode
> because of the source separation rule.

   This is an opinion by liguists but the problem is the government takes
it otherwise. Itaiji or not, once registered by the government, that
character becomes canonical and must be used in any legal document.
   When I started a company I had to file a registration to Hou-mu-kyoku,
or Legal Registory Office. I naturally compiled documents with a text
editor but one of the board member's name contained Itaiji so I had to
blank out that part and handwrite after the documents are printed out.

> And the last one is U+5409 "kichi". For this one, I believe the
> variant is simply a zokuji ("vulgar variant") not recognized as
> standard in the dictionaries. But it is just one of thousands
> of similar variant forms which could be attested for itaiji.

   U+5409 kichi is zokuji AND legal.

> The whole issue of Han variant forms, by the way, is not something
> that the Unicode Standard created, nor did Han encoding unification
> principles in Unicode and 10646 somehow exacerbate the problem for
> IT processing.

   Right. But it is also true that Unicode way of Unifying characters
stand in a way in so many cases when you attempt to put Unicode into
practice. As Kato pointed out, Unicode is more pro-programmers than
pro-users.

> But of course that begs the question of what presentation variation
> detail he or other users perceive to be spelling differences. Correct
> presentation of all details of Han characters may not *be* the
> business of the character encoding per se. There is an architectural
> decision to be made regarding the tradeoff between the identity of
> characters for processing purposes and the appearance of characters
> for rendering purposes, and Kato-san and the IRG appear to disagree
> about where that line should be drawn.

   Right. I don't know where the line should be drawn either. But the
bottom line is that the name should be considered different characters,
not different variation of the same character because this directory
bounds to legal documents. I want, ok, hope, ok, wish Unicode to be
encode legal documents in plain text.

>> favorite appears to be ISO-2022 but as Yet Another Perl Encoding
>> Hacker,
>> ISO-2022 is pain in the arse....
>
> You got that right!

   But when it comes to allocating new character set, ISO-2022 wins
because the authority has to authorize only escape sequence to the new
character set and leave the rest up to the user.

Dan the Lucky Man Whose Name is Encodable by Unicode



This archive was generated by hypermail 2.1.2 : Fri Mar 15 2002 - 17:50:44 EST