Re: InLanguage properties? [Was Re: Encode-InCharset-0.01 Released]

From: David Starner (starner@okstate.edu)
Date: Fri May 03 2002 - 07:17:55 EDT


On Fri, May 03, 2002 at 05:52:37PM +0900, Dan Kogai wrote:
> To overcome this shortage Unicode does have character properties and
> you can get which I<script> it belongs to using that. But unfortunately
> that was not the case for the origins of character repertoire (so I made
> one (Encode-InCharset) because I needed it). Neither is the case for
> Languages.

This seems to be one of those ideographic/alphabetic splits. The
identity of alphabetic characters even across languages is more or less
clear; even without Unicode, I would percieve Latin-* as merely being
subsets of some larger character set. There's no reason why German users
use Latin-1 rather than Latin-2, or -3, or -4; it's just a matter of who
they trade with most. Since you can write many languages in several of
the ISO 8859 series, and write several languages in each 8859 charset,
language is something users of the Latin script never strongly
associated with charset. ISO-2022-like things just don't express this
well. OTOH, from listening to Japanese users, I get the impression that
ISO-2022 fits their view of characters - GB2312 is totally seperate from
JIS X0218 and two characters in different charsets are inherently
different. Whence comes a lot of the Unicode flaming.

-- 
David Starner - starner@okstate.edu
"It's not a habit; it's cool; I feel alive. 
If you don't have it you're on the other side." 
- K's Choice (probably referring to the Internet)



This archive was generated by hypermail 2.1.2 : Fri May 03 2002 - 08:08:18 EDT