RE: Benefits of Unicode

From: Peter_Constable@sil.org
Date: Mon Jan 29 2001 - 18:11:17 EST


>Somebody mentioned TRON, which I'd not heard of before, but
><http://tronweb.super-nova.co.jp/characcodehist.html> says:

Thanks for tracking down a URL. This is slightly interesting.

<quote>
There are several features that make the TRON approach to multilingual
processing unique. One is that the TRON character set is "limitlessly
extensible," and thus it is capable of including all scripts that have ever
been used, and even new scripts that have yet to be invented.
</quote>

Does anybody think we're about to run out of the 1,000,000+ codepoints
available in Unicode any time in the next century?

<quote>
This is done through escape sequences, which are used to switch between
8-bit and 16-bit character sets on very large character planes.
</quote>

This is supposed to be a benefit?

<quote>
Another is that it employs "language specifier codes," which are necessary
so that the correct sorting algorithms, for example, can be applied to data
in a multilingual environment.
</quote>

I don't debate that language identification is needed for language-specific
processing, but I don't see any reason why attributing langids over runs of
text needs to be specified as part of the same mechanism and standard as
character encoding. This author presumably thinks that XML is also mistaken
in that character encoding and language identification are handled using
distinct mechanisms.

<quote>
 Still others are that it does not provide for "user defined characters,"
which can cause problems during data transmission (graphic elements
embedded in text strings are used instead);
</quote>

How can user-defined character create problems that embedded graphic
elements cannot? Receiving software may not know how to interpret a PUA
character other than to present it using a glyph, if you happen to have a
font with a glyph mapped from that PUA character. So, what's the problem?
Granted, an embedded graphic can always be displayed, and so the receiving
user is always assured of a way to see what the sender intended them to
see. But anybody choosing to use PUA characters is aware of the risks they
are taking. I don't see what the problem is. Later in the document, the
author suggests that having access to systems that support TRON is good
because "for the first time in a long time, personal computer users are
really going to have a choice!" By eliminating the possibility for having
PUA characters, TRON takes significant flexibility away from users: they
cannot encode something unless and until the authors of TRON extend it to
support those characters.

<quote>
and it separates text data for storage from text data for display, since
two letters at the storage level, for example, can be merged into one
character at the display level in the form of a "ligature."
</quote>

Unicode very definitely makes the distinction between characters --
meaningful units in the stored representation of text -- and glyphs --
shapes used to present characters. By talking about "text data for
display", it sounds like TRON supports two forks of data: the character
content, and the glyph content. (Somebody please correct me if I'm
mistaken.) I suppose the merits of that can be debated,

<quote>
 Other interesting features are: a character database, which is
particularly necessary for properly employing the huge GT Mincho character
set; and a multilingual writing system, which allows multilingual data to
be sent across the World Wide Web and displayed with reasonable quality
without complex pagemaking algorithms at the other end.
</quote>

I'm not sure what the benefits over Unicode are supposed to be here. (Mind
you, the paragraph I've quoted wasn't presented as points that specifically
represent benefits over Unicode.)

- Peter

---------------------------------------------------------------------------
Peter Constable

Non-Roman Script Initiative, SIL International
7500 W. Camp Wisdom Rd., Dallas, TX 75236, USA
Tel: +1 972 708 7485
E-mail: <peter_constable@sil.org>



This archive was generated by hypermail 2.1.2 : Tue Jul 10 2001 - 17:21:18 EDT