From: Michael D'Errico (mike-list@pobox.com)
Date: Fri Jan 09 2009 - 16:30:44 CST
>> I've thought about this. But since you would want to intermix text
>> and non-text, it makes sense to retain Unicode as a subset and use
>> the same UTF encoding schemes. The problem, though, is that Unicode
>> claims all the code points, so a new standard would have to violate
>> the rules, either by using planes that Unicode will probably never
>> use(*), or by going beyond plane 16 (which is impossible with UTF-16
>> and specifically disallowed for UTF-8 and UTF-32 conformance).
>
> So you got back to the original problem, and just realized that
> Unicode cannot save the world, and you just can't use one single
> encoding to represent any kind of data, since different data requires
> different binary representation based on its characteristics, at least
> if our goal is efficiency.
No, I didn't realize that. What I realized is that Unicode is in
effect hoarding all of the possible UTF-16 code points even though
it will never need or use planes 4, 5, 6, 7, 8, 9, A, B, C, or D.
Unicode also slams the door on an extension standard that utilizes
planes 17 and above since it is non-conformant to allow UTF-8 or
UTF-32 to address the code points beyond plane 16. In addition,
if you decide to be non-conformant for UTF-8, you will run into
the limitation of many programming languages that use UTF-16
internally and can't even access plane 17 or higher.
So, really, the answer is that this has to be done in the unused
Unicode planes, at least until programming languages migrate to
UTF-8 internally. Again, I'm not going to hold my breath.
Mike
This archive was generated by hypermail 2.1.5 : Fri Jan 09 2009 - 16:33:31 CST