From: Jukka K. Korpela (jkorpela@cs.tut.fi)
Date: Mon Nov 24 2008 - 12:19:27 CST
Hans Aberg wrote:
> Perhaps one only needs to list the combinations that belongs to to
> the proper language alphabets. In Swedish that would be
> "ijåäöÅÄÖ". Other combinations, like é, would not be as
> important to get right in Swedish, though it is imported from the
> French where it would appear. But it illustrates the idea.
Technically, in the Unicode sense, “i” and “j” do not contain a diacritic
mark but are atomic (completely non-decomposable) characters, even though a
discussion of diacritic marks must address the issue what happens to the dot
in them.
The description of characters used in a language or in a locale is addressed
in the CLDR, see
http://www.unicode.org/reports/tr35/#Character_Elements
though very unsatisfactorily, if you ask me. It only addresses letters, and
it defines rather arbitrarily just two character sets for a language.
Surely, for example, “e” is more basically a letter in English than “é” is,
but “é” in turn is more of an English letter than “ē” is. Moreover, the
pragmatic reasons for defining the character repertoires contain quite
irrelevant points like “choosing among character encodings.”
Anyway, describing the characters commonly used in a language is useful for
the purposes of font design. It is a difficult task, though, and
controversial. In practice, such descriptions are probably more useful to
people choosing between fonts than font designers. For example, when
choosing a font for Swedish text, you should check that å, ä, ö, é, Å, Ä, Ö,
É all look good. This should be self-evident, but it often isn’t. Moreover,
less common characters are even more easily ignored. Thus, lists of
characters used in a language (at various levels of usage) are directly
useful for constructing test documents for font testing.
-- Yucca, http://www.cs.tut.fi/~jkorpela/
This archive was generated by hypermail 2.1.5 : Mon Nov 24 2008 - 12:22:52 CST