From: Jukka K. Korpela (jkorpela@cs.tut.fi)
Date: Sat Sep 10 2005 - 09:20:01 CDT
On Fri, 9 Sep 2005, Doug Ewell wrote:
> I'm afraid the list is at risk of falling into a hole debating this "how
> many languages on the head of a pin" question, when the real underlying
> question may be completely different.
Indeed, especially since the question was probably based on a 
misconception on one thing at least, since it asked about encoding forms 
and not Unicode.
While waiting for a clarification to the question, we can still discuss 
_another_ question, namely that of language support by Unicode. There 
seems to be confusion around it, too, and the question itself is somewhat 
obscure. For example, does "Unicode" mean the Unicode repertoire of 
characters, or the Unicode Standard, or the Unicode Consortium?
I'd say that the short answer to the question "what languages are 
supported by the Unicode Standard?" would be as follows (without trying to 
clarify the question much - can't do that in a _short_ answer):
All living languages, and many dead languages, can be written in 
their normal writing system(s) using Unicode characters. However, some
of their characters cannot be represented as single Unicode characters
but as combinations. Some orthographic and typographic constructs, which 
could in principle be expressed in plain text, cannot be expressed
in Unicode. Some of the properties of characters as defined by the Unicode 
Standard do not correspond to their behavior in different languages.
Moreover, Unicode is meant to describe plain text only, so it generally 
lacks any support that might be needed for display and processing of text 
by language-specific rules.
Well, that's not very short, really. Neither is it very understandable, 
since it lacks examples. The point, anyway, is that "support to a 
language" can mean much more than just presence of all characters used in 
a language. It's also debatable, since people may disagree on what really 
belongs to a language, even at the character level. Moreover, it's 
debatable what can be regarded as "support". For example, if the rules of 
a language require a thin nonbreakable space before or after some 
punctuation marks, can we claim that Unicode "supports" it, since you can 
use a thin space character with a zero width no-break space on both sides 
of it?
-- Jukka "Yucca" Korpela, http://www.cs.tut.fi/~jkorpela/
This archive was generated by hypermail 2.1.5 : Sat Sep 10 2005 - 09:20:53 CDT