From: Hans Aberg (haberg@math.su.se)
Date: Sun Apr 24 2005 - 04:52:10 CST
At 21:24 +0200 2005/04/23, Marcin 'Qrczak' Kowalczyk wrote:
>My orthography for Polish can consume an infinite number of characters
>if I treat it like Hangul was treated and encode precomposed characters
>individually. Ok, I'm taking all odd numbers, so even ones are left
>for other scripts :-)
Here is an attempt to try to extract an
underlying principle: One can always group
together symbols, forming a new semantic unit.
Then the number of such semantic units can be
very large, or even potentially infinite.
In order for a semantic unit to be called a
character, it should probably be atomic in some
sense. Let's examine this idea:
The Swedish language symbol ä (a with two dots
above) is a separate letter, not to be viewed as
an alteration of the letter a. So it is atomic.
It is reasonable to enter it as a separate
character. In German, however it is an umlaut,
alteration of the letter a. So there one might
add it as combination of two characters. In the
program TeX, originally, ä would be constructed
in the latter way. It then turns out that if one
changes fonts, the dots do not end up exactly
right typographically. So, because of this font
limitation, it is suitable to have ä as a
separate character. But now smart fonts are
arriving. Then one can enter it as a combination
of two characters always. It would be easy for a
computer program, in Swedish to recognize it as a
single Swedish letter ä. So when examining what
is to be viewed as atomic, a number of principles
can be used, and that in part depends on such
things as what computer software one wants to use.
-- Hans Aberg
This archive was generated by hypermail 2.1.5 : Sun Apr 24 2005 - 04:53:36 CST