Re: What constitutes "character"?

From: Gaspar Sinai (gsinai@yudit.org)
Date: Thu Nov 08 2001 - 04:54:27 EST


Hope you all take this as critisism and not a troll ☺

On Thu, 8 Nov 2001, Arjun Aggarwal wrote:

> Hello everybody
> On Wed, 7 Nov 2001, Philipp Reichmuth wrote
> > I've been wondering a little bit recently about the definition of
> > "character" vs. "glyph variant" that is applied during decision
> > whether or not a given proposed character should go into Unicode.
>
> If anybody on the list really thinks that they can submit
> any characters into Unicode then they should positively
> respond to my query.

I really understand your point. My opinion is that the unicode
standard is broken in many ways. I think Unicode Consortium did
have the opportunity to create a simple, and easily adaptable
standard that would bring all the scripts together, but they
blew it. Instead its rules are based upon contradictionary
statements and finally there is no rule at all.

I think that the Indian sctipts deserve better character
assignement - scripts can not be input
without a complicated application that does character
composition. That being said, I would like to divert this
thread.

A lot of applications, even today are no using unicode for
the internal representation of the text. Let me mention a
few facts, that I think made unicode 'not the preferred
way' to internally represent text in these multi-lingual apps.

1. Not Simple to Use
   ==================
There is no way to do the simplest operations without a
huge library behind. Just think of a simple character search;
  if you want to seach for:
     Á character (U+00C1) you can not properly search without
  decomposing it. Otherwise you wont match it with Á
  (U+0041, U+0301).

2. Unification Problems
   ====================
So we unify characters. In this case hy we have a
wide A U+FF21 in unicode and why we don't have
a wide version of Д U+0414? Is there any reason? I
think both should be included becasue some local standards
are supporting it, and this would create compatibility,

3. Unfair Use of Code-space
  =========================
Guys in the first plane are lucky. It was very clear
right from the beginnign that 16 bits are not enough.
They are very precious, some os's are using 16-bit unicode
internally. On one hand Tamil can not have the full character
set and has to be compsoed 10 thousand Hangul characters,
that can really be composed 한 U+D55C could be 3 characters
are thrown in.

4. Binary Incompatibility
  =======================
If you read a text into the memory there are a number of
ways to write the data into the files, creating a totally
different document, although only a few characters are supposed
to be changed. I think BiDi is like this.

Evolution of Unicode Standard to me seems like this:
1. Simplicity. Solution is just a step away.
2. Create problems.
3. Solve those problems by intoducing new problems.
4, Loop to 3.

Sorry if I am too critical, I really would like to have a
standard that everybody is using. And they are using it
because:
1. It is simple to use.
2. Consistent

Gaspar



This archive was generated by hypermail 2.1.2 : Thu Nov 08 2001 - 06:03:47 EST