Re: Brahmic harmonization

From: D.V. Henkel-Wallace (gumby@cygnus.com)
Date: Wed Jun 12 1996 - 18:27:47 EDT


At 19:58 06/08/96, Michael Everson wrote:
>At 18:43 1996-06-07, Eng C. Born wrote:
>> It is a good idea to support Sanscrit and Pali. But, what do you mean
>>by Brahmin harmonization? ...In my opinion,
>>the code assignment should be arranged as close as possible to the way
>>each script are used and taught in school.

Example of current practice: the swedish alphabet has three more letters
than the romance alphabet, these new letters follow the letter "z". But in
the ISO layout those letters don't follow the "Z" and "z" codepoints,
rather they lie in places that don't otherwise conflict with ASCII.

>The
>corpus of Sanskrit and Pali data is of such an enormous size -- and of such
>enormous cultural importance to Asia and to all the world -- that the
>notion of making data transfer for common texts more difficult should shock
>us....In the Latin script, just
>transferring data from the PC to the Macintosh can be a problem or an
>irritation for people, who know nothing about _why_ their vowels come out
>wrong.

This is the other important point: the analogy here is more Latin/Greek
where the code points are roughly similar. There is no value to scrambling
the alphabets arbitrarily overlay similar points, but where there are
congruences there's no reason to avoid them.

Both of these examples follow the principle of least surprise _for the
people who would most likely encounter the surprise_. In the first
example, a progammer is more likely to find that an existing piece of code
will behave sensibly if a widely used standard is extended rather than
being scrambled. In the second case, mis-mapping characters when the
correct mapping (e.g. a font) is unavailable may still have some chance of
yielding a comprehensable output for the user. Representing the code
points in the order taught in school is unlikely to have benefit to more
than a very small number of programmers, and no others.

This principal is well represented in Unicode, not to mention many other
systems.

D. Vinayak Henkel-Wallace



This archive was generated by hypermail 2.1.2 : Tue Jul 10 2001 - 17:20:31 EDT