Re: What constitutes "character"?

From: James Kass (jameskass@worldnet.att.net)
Date: Thu Nov 08 2001 - 08:49:41 EST


Gaspar Sinai wrote,

> 1. Not Simple to Use
> ==================
> There is no way to do the simplest operations without a
> huge library behind. Just think of a simple character search;
> if you want to seach for:
> Á character (U+00C1) you can not properly search without
> decomposing it. Otherwise you wont match it with Á
> (U+0041, U+0301).

Also may need to search both ways for the lower case "á"
at the same time.

Is there any resolution to this that wouldn't involve adding
rules? Suppose there were a rule excluding non-shortest
form character encodings? This would preclude decomposition.
Or, would user communities select certain combinations as
standard for their language's writing system? In other
words, one language group may state that only U+00C1 is
to be used in their language to represent the desired
character, while another language group may prefer the
base letter with combining diacritic combination. In this
case, there would be another set of language-based rules
added.

> 2. Unification Problems
> ====================
> So we unify characters. In this case hy we have a
> wide A U+FF21 in unicode and why we don't have
> a wide version of Д U+0414? Is there any reason? I
> think both should be included becasue some local standards
> are supporting it, and this would create compatibility,

The U+FF21 is there because it was included in an older
standard character set. Earlier today, coincidentally,
I came across this page:

http://www.xfree86.org/pipermail/i18n/2001-May/001836.html
I18n]Re: Do Japanese users really need doublewidth Cyrillic?

The gist of the thread seems to be that fonts supporting
CJK and geared for the CJK user community would have
the wide version of Д U+0414 at 0414, which is arguably
where it belongs.

> <snip>

> Sorry if I am too critical, I really would like to have a
> standard that everybody is using. And they are using it
> because:
> 1. It is simple to use.
> 2. Consistent

and 3. Universal ?

Best regards,

James Kass.



This archive was generated by hypermail 2.1.2 : Thu Nov 08 2001 - 09:02:49 EST