From: William J Poser (wjposer@ldc.upenn.edu)
Date: Thu Feb 07 2008 - 00:30:29 CST
>In a world where the "next million users" are making less than $2 a day and
>are unlikely to be buying a computer anytime soon, and the majority of
>cellular phones available will not support anything needing more than one
>byte for most letters, I'd say that the "obsession" with size is no an
>entirely outdated obsession....
The existence of devices that only support single-byte encodings has
no bearing on whether a character should be placed in a range that
requires two bytes, three bytes, or four bytes in UTF-8. The UTF-8
representation is unusable on such devices no matter what. The only
way to deal with such devices is to use a single byte encoding for
the relevant characters, that is, not Unicode.
The existence of such devices does bear on which characters should be
in the single-byte UTF-8 range, but since nearly everything necessarily
lies outside of that range, there isn't much to be done about it.
Positioning in the two byte range vs. the three byte range is
something about which a little bit could be done, but remember that
there are only 2,048 codepoints in the two-byte range, so there isn't
all that much room. And how many devices support 1 and 2 byte codes
but not 1 through 3? Does it really make a difference whether a
writing system is in the two byte range or three byte range?
I also note that my question is not only about placement within Unicode
and how many bytes a character requires in UTF-8. It is about attempts
to save a bit of storage more generally, such as the 24-bit encoding
recently discussed.
Incidentally, what is the nature of the limitation of cell phones to
single byte encodings? Is there a technical reason for this, or is it
merely that the manufacturers have thus far not felt much demand for
multibyte encodings?
>Also, when one looks at scripts side by side placed a decade ago for
>arbitrary reasons that lead to any inconvenience on the part of those who
>might want to use the script, it is preferable to have a better argument
>than "just cuz" because if that were so the companies selling primarily in
>countries that DO consider this to be an outdated notion could have
>allocated according to putting the more emerging markets in the smaller
>spaces and the more advanced ones in the three-byte area....
Actually, "just cuz" is a very good argument. There are a variety of things
that could have been done better, in hindsight. Some of them probably
couldn't have been foreseen; some perhaps could have been. Nobody's
perfect. But decisions had to be made, and for excellant reasons of
stability, it isn't wise to change them too readily, so until we're
ready to go to an incompatible ++Unicode, we're stuck with some
arbitrary decisions, some of which may not have been optimal.
Bill
This archive was generated by hypermail 2.1.5 : Thu Feb 07 2008 - 00:34:31 CST