From: Gregg Reynolds (unicode@arabink.com)
Date: Thu Oct 27 2005 - 11:34:23 CST
Kenneth Whistler wrote:
> Jukka said:
> 
> practice long predates the Unicode Standard, and was inherited
> into Unicode from ASCII itself), then proposing to add *another*
> A-F, using characters that look just like the existing A-F,
> but which are posited to be only hexadecimal digits (and *not*
> letters -- even though they look just like the letters they
> are cloned from), then all hell breaks loose in *future* processing
> of hexadecimal numeric expressions. 
> 
> The problem isn't that existing software would break, but rather
> that it would be then gradually forced (and inconsistently and
...etc
Well you've convinced me.  I would add a simpler, non-technical
justification for Unicode conservatism (or even ultra-conservatism.)
That is that adding such edge-case characters (including my beloved RTL
digits) would amount to trying to legislate market dynamics.  Vendors
don't support Unicode because it's a lovely standard, they do so because
the market demands it.  And "Unicode support" doesn't mean support for
everything in Unicode, it only means that what you do support, you
support correctly.
It all goes back to cost/benefit analysis.  The cost of supporting
speculative characters like hex a-f and rtl digits is demonstrable.  The
benefit is purely speculative.  They fall into the "if you build it,
they will come" category.
But suppose there really is undiscovered demand for such characters.  It
isn't Unicode's job to discover such demand; the honus of proof is
clearly on the marketplace.  Build it; if they come, then you have a
case based on real behaviour for adding de jure support for such
characters.  I don't interpret Unicode's conservative policy as shutting
the door forever on such characters; it just means that you have to show
that people actually use them, i.e. that there is some market benefit to
offset the cost.  Thanks to open source, any community that really wants
 support for speculative characters can have it relatively cheaply.  If
it proves popular the marketplace (other vendors) will notice.
If I'm not mistaken the IETF requires at least two independent
implementations for any proposed protocol before it will be considered
seriously; Unicode should take a similar approach regarding speculative
characters.
Consider what would happen if Unicode took a liberal approach - show us
some reasonably logical reasons for encoding a new character, and we'll
do it.  First of all, the vast majority of vendors would simply ignore
the edge cases, since there is no economic incentive to support them.
So you don't gain much by having de jure support for them.  Down the
road the result would likely be fragmentation of the standard and the
marketplace.  Vendors already only support the parts of Unicode they
(more accurately: their customers) are interested in; the more
speculative characters you add, the more incompatability you have.
Eventually somebody would say "Unicode is too big and sprawling; let's
make a new, narrower standard."
To me the interesting question is, how much real-world support for
speculative characters would be sufficient to merit serious
consideration by Unicode?  Suppose one added support for RTL digits to
Apache, Firefox, a few editors, etc., and could point to a few thousand
webpages using the characters.  At what point would Unicode say, hmm,
the population actually using these characters is sufficient for us to
seriously consider adding them to Unicode.  Care to speculate?  ;)
-gregg
This archive was generated by hypermail 2.1.5 : Thu Oct 27 2005 - 11:35:21 CST