From: Jukka K. Korpela (jkorpela@cs.tut.fi)
Date: Mon Oct 17 2005 - 07:24:11 CST
On Mon, 17 Oct 2005, Stephane Bortzmeyer wrote:
> It is quite common in France (where the composed characters are much
> more common) to have *some* official bodies unable to deal with
> anything else than US-ASCII.
I'm afraid such restrictions, and variation in them, is rather common,
even in countries where people use an essentially richer character
repertoire in everyday E-mail, text processing, etc. What's worse,
the restrictions are often undocumented or poorly documented, and
what happens when data exceeds the limitations might be unpredictable.
I don't know what could be done with this in general, but the
"exemplar characters" definitions in CLDR come into my mind.
They are currently limited to letters, unfortunately, and they
are meant to describe the use of letters in a language, rather
than the common practice of character repertoire in a country or
other territory.
It would be nice if we had a definition of "commonly available characters"
for each country, describing the _typical_ repertoire. But I'm afraid the
situation varies too much, as seen from the examples presented. In
addition to technical limitations, which may still exist, there are
restriction on the repertoire due to political decisions and due to
assumptions that (e.g.) non-ASCII characters cannot be typed in or just
might break something.
So it's perhaps more constructive to look forward and try to specify the
character _requirements_ for writing different languages correctly,
perhaps at several levels.
-- Jukka "Yucca" Korpela, http://www.cs.tut.fi/~jkorpela/
This archive was generated by hypermail 2.1.5 : Mon Oct 17 2005 - 07:25:24 CST