Dan Oscarsson replied:
> >
> >What this means is that for the foreseeable future (50 - 100 years?),
> >10646 characters will be constrained to 20.1 bits (round up to 21).
> >
>
> So the Unicode hackers have finally succeded to destroy ISO 10646.
> Just because Unicode made the bad decision to limit their character
> coding to 16 bits does not mean that we have to limit the available
> space for characters to less than 31 bits.
>
Dan, Dan, take a deep breath, please.
There is absolutely no intention here to "destroy ISO 10646." You
apparently didn't read my note carefully enough. I said:
>WG2 has just
>approved the proposal to remove the user planes and user groups from
>10646 past Plane 16.
I did *not* say that WG2 has decided to remove any planes and groups
from the *architecture* of 10646. 10646 will still formally be
a 31-bit standard, with all those empty planes and groups containing
(theoretically) entries that say "(This position shall not be used.)"
But both of the committees (UTC and WG2) have agreed to keep filling
in the standard compactly at the low end. And WG2 has a very detailed
roadmap of candidate scripts for inclusion in Plane 1 and in the
remaining space of the BMP. Many people have reviewed that roadmap,
and there is very little contention about the general framework it
presents for the next decade's (or more) work for encoding new
scripts into 10646.
To echo John Cowan, nobody even has any candidate characters for
Planes 4 through 13, let alone the 2 billion plus code points past
U-0010FFFF. And once 10646-2 is published next year, with the
major known repertoire of unencoded Han characters added, the
rate of new additions to the standard is guaranteed to slow down
tremendously. Even if WG2 managed to encode 3000 characters a year
after that (which would be a stupendous pace, given the difficulty
of the remaining scripts to deal with), that would mean 20
years just to fill up one more plane. And the European national
bodies are tired enough hearing about Etruscan, Gothic, Deseret,
and soon to come a dozen more Southeast Asian scripts, etc., etc.
They may slow the process down even further.
So why the panic factor?
Please remember that 10646 itself normatively defines UTF-16,
as well as UTF-8. Even within the context of 10646 alone, and
presuming that you ignore the Unicode Standard entirely (a silly
premise, I admit), 10646 had an interoperability problem between
the normative "transformation formats" that it defined. By
eliminating the user planes and groups above Plane 16, 10646
fixes its own problem, since all assigned characters in 10464
can henceforth interoperate correctly in all forms of encoding
and transformation formats normatively defined in the standard
itself.
--Ken
This archive was generated by hypermail 2.1.2 : Tue Jul 10 2001 - 17:21:00 EDT