RE: The perfect solution for the UTF-8/16 discussion

From: B (11@onna.com)
Date: Tue Jun 26 2001 - 22:30:03 EDT


Only one problem....

What of 1FFFFFFFF? Mozibake city.

I have not had my ramen. I ought to.

Oh yeah, another problem.

How many bytes in "hello world"?

But you know what I love??

YOU WILL USE HAN ZI AND KANA AND THE LIKE BECAUSE OTHERWISE YOU WILL TAKE UP TOO MUCH MEMORY $B$i$i$i$i$i$i$i$i$i$i$i$i$i$i$i$i$j$c$!$!$!$!$!$!$!$C!*(B
Hello world is 44 bytes in your system. How many bytes would it be in Han zi?

Will there be character codes for CENTESIMAL DIGITs, or at least SEXAGESIMAL DIGITs (date / time)?

I want UTF-24 if we need more than 16. Or UTF-21.
END THE TYRANNY OF THE FIXED_LENGTH BYTE

$B!!!!$i$s$^(B $B!z$8$e$&$$$C$A$c$s!z(B
$B!!!_$"$+$M(B
$B!<!<!<!<!<(B PTKA IZGT F SFNNGYGB ZRMSFTB WM
$B!!$"$^$s$1(B NFEGT FM MGYWPRMKA FM F SFNNGYGB IWOG
$B$M$1$"$:!!(B IWKK QGT FT IPQGT ZFXG GHRFK YWJZNM.
$B$i$s$^!!!!(B
$B!<!<!<!<!<(B
$B$$$$$J$:$1(B

--- Original Message ---
$B:9=P?M(B: "Carl W. Brown" <cbrown@xnetinc.com>;
$B08@h(B: Markus Scherer <markus.scherer@jtcsv.com>;unicore <unicore@unicode.org>;unicode <unicode@unicode.org>;
Cc:
$BF|;~(B: 01/06/26 20:57
$B7oL>(B: RE: The perfect solution for the UTF-8/16 discussion

>Markus,
>
>I think that big-endian UTF-32 is the only way to go. The solution to ASCII
>vs. EBCIDC would go away if we got all of the hardware to support Unicode
>natively. We could forget about bytes and make the 32bit word the least
>addressable amount of memory.
>
>utf-64 would only be used for vanity characters. Sort of like the star
>registry. utf-64 would consist of 0xFFFFFFFF followed by the 64-bit number
>as 2 32 bit numbers. Implementing utf-64 would be only for the vain and not
>part of any known or imaginable OS.
>
>Carl
>
>> -----Original Message-----
>> From: unicode-bounce@unicode.org [mailto:unicode-bounce@unicode.org]On
>> Behalf Of Markus Scherer
>> Sent: Thursday, June 21, 2001 11:16 AM
>> To: unicore; unicode
>> Subject: The perfect solution for the UTF-8/16 discussion
>>
>>
>> Abolish all in-process Unicode encodings except UTF-16.
>> If everyone uses the same encoding form then there is no problem
>> with different string lengths, results of binary comparisons, etc.
>>
>> Once we are here, abolish all little-endian UTF-16
>> implementations. This will save a lot of byte swapping, and
>> binary comparisons can always be performed with memcmp().
>>
>> Heck, abolish all little-endian platforms and all platforms with
>> integer widths other than 8, 16, 32, etc.
>>
>> :-)
>>
>> markus
>>
>
>
>



This archive was generated by hypermail 2.1.2 : Fri Jul 06 2001 - 00:17:19 EDT