Michael Everson:
ME> Unicode exists to encode the world's writing systems
[...]
ME> We can't prevent people from typing with it what they wish.
I think Gaspar has a point, though, although he may have overstated
it.
We, ASCII-age programmers, are used to considering plain text
rendering as being injective up to binary identity. We carefully
choose fonts that distinguish between O and 0, 1 and l. We use
editors that warn us about non-native line ending conventions, about
whitespace at the end of lines, about white lines at the end of files.
With Unicode, doing the same becomes impossible, which some of us
(including myself) find disorienting. We will have to change our work
habits, and we'll have to work out new tricks for making our software
reliable when confronted with a non-technical user.
We're desperately looking for the data needed to make our software
user-friendly, only to learn that no such data is available. Now, I'm
not blaming anyone for the lack of such data -- I'm quite aware of the
amount of work that the current state of Unicode represents, and I can
understand that not all needs have been catered for. Still, I don't
think that denying the existence of the issue is a productive approach.
As far as I know, there is no data that provides:
- a cross-reference of characters whose associated glyphs are
identical, whatever the font (applies to symbols and ``modifier
letters'');
- a cross-reference of characters whose associated glyphs could be
confused by a non-technical user;
- a cross-reference of characters that may, in the absence of
suitable fonts, be used as fallbacks for each oterh;
- a map from characters to scripts;
- a map from characters to languages.
While much of this data may be deduced from the character names,
you'll doubtless agree that many programmers would rather do something
else than working out which characters exactly can appear in a Coptic
context.
Juliusz
This archive was generated by hypermail 2.1.2 : Mon Feb 11 2002 - 13:18:26 EST