From: Pim Blokland (pblokland@planet.nl)
Date: Thu Apr 03 2003 - 14:05:23 EST
All this talk about these higher-plane characters - you know, plane
1 and above; let's call them MathText characters for short - has got
me wondering.
Why is there no UTF-24?
See, these MathText characters take up a lot of space. No matter how
you encode them; UTF-8, UTF-16 or UTF-32; they always are 4 bytes
long. Now if we had UTF-24, they would only take up 3 bytes.
And since the Unicode character range is formally defined to run no
higher than U+10FFFD, which fits in 3 bytes, I see no reason why
no-one has ever gone to the trouble of defining a 3-byte storage
method.
Implementation would be easy; there would be only two variants,
UTF-24LE and UTF-24BE, and that's it. No juggling with bits like in
UTF-8 and UTF-16 or anything complicated like that. Just the plain
character values, just like in UTF-32, only with 75% of the storage
needed.
Comments anyone?
Pim Blokland
This archive was generated by hypermail 2.1.5 : Thu Apr 03 2003 - 14:43:44 EST