On Wed, 2 Jul 1997, Markus G. Kuhn wrote:
>
> A full ISO 10646 Level 1 implementation is NOT possible on any fixed cell
> width system like VT100 emulators, xterm, kermit, or the system I use here
> to write this email. Level 1 contains right-to-left characters and
> characters that cannot be displayed appropriately in the typical 9x14
> pixel cells.
I have been been wanting to comment on this thread for awhile now, but
others were adequately addressing most of it. Despite popular belief,
fixed cell applications like terminal emulators can cope with all sorts
of unusual characters with an appropriate amount of work. I just left a
company that worked on wireless terminal emulators and so am quite familiar
with a) Unicode in embedded environments and b) how bidi, composition and
full/halfwidth characters are used in existing fixed cell terminal streams.
I found trying to deal with the hacks that were applied to the data stream
more of a hassle than redesigning the rendering engine.
> If every developer of an application standard has to identify these characters
> herself, then she won't do it, because it is too much work, and because
> the subset she gets will most likely slightly different from what someone
> who independently comes up with such a subset would get. Many different
> subsets created independently for the same purpose are not what
> standardization is about.
The problem that you outline in detail is that people don't seem to be
reading the Unicode standard before dismissing it. The key is to not have
your system care if it is missing a glyph in a font. I had the entire
Unicode properties database squeezed into about 30K of ROM and 40K of RAM
(could be tighter but I needed to initialize quickly,) and then let the
appropriate bitmap fonts be installed. Almost nobody wants 40,000 characters
displayed, especially if their host system is using a non-Unicode encoding.
> In addition, the sheer size (~40000 versus ~1000, factor 40) characters of
> Level 1 is just mind boggling for any non-Asian developer who is not an
> i18n expert and who is not specifically developing applications for the
> Asian market.
I would have thought that it would be very easy for someone to cut out the
CJK section if they don't want to support East-Asia.
[A number of examples that real world developers are shunning Unicode
because they don't understand that it is normally implemented incrementally.
Particularly sad are those cases citing currently economical hardware
restrictions.]
Alas, I have to agree that this is probably the state of affairs. I've
started looking for new employment and nobody seems to know what i18n is
much less why they might want someone to help them implement it. I suspect
I'll have to get a position as a developer somewhere and convince people to
> I have suggested to define Unicode subsets in the examples that I quoted
> above. The answer was each time: "We do not have the linguistic background
> to come up with a reasonable subset and this is much too much work for
> our project. If there were a nice simple subset of Unicode available,
> we would have a look at it and quote it, but at the moment ISO 8859
> looks so much simpler for us."
You should be telling them, "Don't mess up the data that is passed to you,
support what you can, your spec is now extensible to new languages as
you gain expertise."
Geoffrey Waigh
gpw@van.cybersurf.net
This archive was generated by hypermail 2.1.2 : Tue Jul 10 2001 - 17:20:35 EDT