Re: CESU-8 vs UTF-8

From: Marcin 'Qrczak' Kowalczyk (qrczak@knm.org.pl)
Date: Sun Sep 16 2001 - 06:06:19 EDT


Sun, 16 Sep 2001 01:14:06 -0700, Carl W. Brown <cbrown@xnetinc.com> pisze:

> If it can be demonstrated that there is a real need for an encoding
> like CESU-8 then is should be very different from UTF-8. How does
> SCSU for example sort?

SCSU encoding is non-deterministic and its representations can't
be compared lexicographically at all (logically equal strings might
compare unequal).

Ehh, we wouldn't have the problem with CESU-8 now if Unicode hadn't
been described as a 16-bit encoding in the past. I still think that
UTF-16 was a big mistake. Too bad that it still affects people who
avoid it.

We can't change the past, but I hope that at least UTF-8 processing can
be done without treating surrogates in any special way. Surrogates are
relevant only for UTF-16; by not using UTF-16 you should be free of
surrogate issues, except by having a silly unused area in character
numbers and a silly highest character number. Please don't spread
UTF-16 madness where it doesn't belong.

-- 
 __("<  Marcin Kowalczyk * qrczak@knm.org.pl http://qrczak.ids.net.pl/
 \__/
  ^^                      SYGNATURA ZASTĘPCZA
QRCZAK



This archive was generated by hypermail 2.1.2 : Sun Sep 16 2001 - 05:48:12 EDT