From: Markus Scherer (markus.scherer@jtcsv.com)
Date: Thu Apr 03 2003 - 15:01:50 EST
Pim Blokland wrote:
> Why is there no UTF-24?
Well, I once proposed UTF-20...
> See, these MathText characters take up a lot of space. No matter how
> you encode them; UTF-8, UTF-16 or UTF-32; they always are 4 bytes
> long.
True for them alone, in those UTFs. Short of defining another Unicode encoding, there are two
answers that I can offer you:
1. Such characters are expected to be the minority of text, I suppose even in Math text, because
there are lots of other characters in such documents - punctuation, spaces, digits, regular text -
that are mostly on the BMP and thus shorter. So total Math documents with some MathText
supplementary characters will use, on average, fewer than 3B/code point in UTF-8/16.
2. If you want compression, use the existing SCSU (UTR #6) and BOCU-1 (UTN #6), or general-purpose
compressions like bzip2.
Note that this is only for text interchange - the majority of Unicode-aware software programs uses
UTF-16 internally.
Best regards,
markus
-- Opinions expressed here may not reflect my company's positions unless otherwise noted.
This archive was generated by hypermail 2.1.5 : Thu Apr 03 2003 - 15:35:05 EST