Unicode in the balance (was Re: What constitutes "character"?)

From: Peter_Constable@sil.org
Date: Thu Nov 08 2001 - 10:03:50 EST


Gaspar:

>1. Not Simple to Use

Providing comprehensive multlingual text capabilities is simply not easy,
and any other approach that has been implemented is far more complicated.
(Have you ever looked at ISO 2022?)

>2. Unification Problems
> ====================
>So we unify characters. In this case hy we have a
>wide A U+FF21 in unicode and why we don't have
>a wide version of Д U+0414? Is there any reason?

The former was included to provide backward compatibility with major
industry standards -- which is one of the compromises in Unicode that has
made it successful (and it *is* successful, in spite of your opinions to
the contrary).

>3. Unfair Use of Code-space
> =========================
>Guys in the first plane are lucky. It was very clear
>right from the beginnign that 16 bits are not enough.
>They are very precious, some os's are using 16-bit unicode
>internally. On one hand Tamil can not have the full character
>set

Except for some details that are in process of getting cleared up, Tamil
(like Devanagari) is adequately dealt with in Unicode, and there is a
working implementation in Windows 2000.

> and has to be compsoed 10 thousand Hangul characters,
>that can really be composed 한 U+D55C could be 3 characters
>are thrown in.

I think many of the architects of Unicode would freely admit that the
precomposed Hangul syllables was an unfortunate but politically-necessary
addition. (BTW, it's 11,000.)

>4. Binary Incompatibility
> =======================
>If you read a text into the memory there are a number of
>ways to write the data into the files, creating a totally
>different document, although only a few characters are supposed
>to be changed. I think BiDi is like this.

Bidi is not like this. It is true that it is possible for a given piece of
text to be represented as byte sequences in more than one way. This has
been clearly documented in the standard, and the standard provides all of
the necessary mechanisms for dealing with these potential diffeences. In
practice, people have been implementing support for Unicode and have not
been hindered from success as a result of these issues.

>Evolution of Unicode Standard to me seems like this:
>1. Simplicity. Solution is just a step away.
>2. Create problems.
>3. Solve those problems by intoducing new problems.
>4, Loop to 3.

I think this lacks any appreciation of the history and the complexity of
what Unicode had to build on top of, and it is simply an invalid
assessment of the great albeit imperfect work that has been acccomplished
by the architects of the standard.

>Sorry if I am too critical, I really would like to have a
>standard that everybody is using.

People that don't start using it are eventually going to be a relatively
small minority. Maybe not tomorrow or next week, but eventually.

- Peter

---------------------------------------------------------------------------
Peter Constable

Non-Roman Script Initiative, SIL International
7500 W. Camp Wisdom Rd., Dallas, TX 75236, USA
Tel: +1 972 708 7485
E-mail: <peter_constable@sil.org>



This archive was generated by hypermail 2.1.2 : Thu Nov 08 2001 - 11:10:38 EST