From: Jukka K. Korpela (jkorpela@cs.tut.fi)
Date: Mon Nov 24 2008 - 12:19:27 CST
Hans Aberg wrote:
> Perhaps one only needs to list the combinations that belongs to to
> the proper language alphabets. In Swedish that would be
> "ijåäöÅÄÖ". Other combinations, like é, would not be as
> important to get right in Swedish, though it is imported from the
> French where it would appear. But it illustrates the idea.
Technically, in the Unicode sense, “i” and “j” do not contain a diacritic 
mark but are atomic (completely non-decomposable) characters, even though a 
discussion of diacritic marks must address the issue what happens to the dot 
in them.
The description of characters used in a language or in a locale is addressed 
in the CLDR, see
http://www.unicode.org/reports/tr35/#Character_Elements
though very unsatisfactorily, if you ask me. It only addresses letters, and 
it defines rather arbitrarily just two character sets for a language. 
Surely, for example, “e” is more basically a letter in English than “é” is, 
but “é” in turn is more of an English letter than “ē” is. Moreover, the 
pragmatic reasons for defining the character repertoires contain quite 
irrelevant points like “choosing among character encodings.”
Anyway, describing the characters commonly used in a language is useful for 
the purposes of font design. It is a difficult task, though, and 
controversial. In practice, such descriptions are probably more useful to 
people choosing between fonts than font designers. For example, when 
choosing a font for Swedish text, you should check that å, ä, ö, é, Å, Ä, Ö, 
É all look good. This should be self-evident, but it often isn’t. Moreover, 
less common characters are even more easily ignored. Thus, lists of 
characters used in a language (at various levels of usage) are directly 
useful for constructing test documents for font testing.
-- Yucca, http://www.cs.tut.fi/~jkorpela/
This archive was generated by hypermail 2.1.5 : Mon Nov 24 2008 - 12:22:52 CST