Re: UTF-24

From: Doug Ewell ([email protected])
Date: Fri Apr 04 2003 - 01:36:54 EST

Next message: John Cowan: "Re: Exciting new software release!"

Previous message: Doug Ewell: "Re: Exciting new software release!"
In reply to: Pim Blokland: "UTF-24"
Next in thread: Carl W. Brown: "RE: UTF-24"
Reply: Carl W. Brown: "RE: UTF-24"
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ] [ attachment ]
Mail actions: [ respond to this message ] [ mail a new topic ]

Pim Blokland <pblokland at planet dot nl> wrote:

> Why is there no UTF-24?
>
> See, these MathText characters take up a lot of space. No matter how
> you encode them; UTF-8, UTF-16 or UTF-32; they always are 4 bytes
> long. Now if we had UTF-24, they would only take up 3 bytes.

Yes, but supplementary characters will normally appear in one of two
circumstances:

(1) as part of a small alphabet (e.g. Deseret, Shavian, Osmanya),
interspersed with spaces and punctuation in the U+00xx range, in which
case an existing storage format (SCSU) can encode them in only 1 byte
each plus an initial 3-byte overhead.

(2) as part of a larger set (e.g. math symbols, CJK Extension B),
interspersed with even more BMP characters, in which case the bytes
saved on each supplementary character are overwhelmed by the bytes
squandered on each BMP character.

> And since the Unicode character range is formally defined to run no
> higher than U+10FFFD, which fits in 3 bytes, I see no reason why
> no-one has ever gone to the trouble of defining a 3-byte storage
> method.

Most likely because no modern computer uses a 3-byte (24-bit) internal
processing unit, and because it would be false economy for real-world
Unicode text (see (1) and (2) above).

> Implementation would be easy; there would be only two variants,
> UTF-24LE and UTF-24BE, and that's it.

I agree, it certainly was easy to implement. (Oops, did I say that out
loud?)

-Doug Ewell
Fullerton, California
http://users.adelphia.net/~dewell/

Next message: John Cowan: "Re: Exciting new software release!"
Previous message: Doug Ewell: "Re: Exciting new software release!"
In reply to: Pim Blokland: "UTF-24"
Next in thread: Carl W. Brown: "RE: UTF-24"
Reply: Carl W. Brown: "RE: UTF-24"
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ] [ attachment ]
Mail actions: [ respond to this message ] [ mail a new topic ]

This archive was generated by hypermail 2.1.5 : Fri Apr 04 2003 - 02:15:41 EST