RE: Encoding for Fun (was Line Separator)

From: jon@hackcraft.net
Date: Wed Oct 22 2003 - 11:11:43 CST


> I can't argue with that ... but my strings were always in (32-bit wide)
> Unicode at "sort-time". I'm not sure exactly how much value there is a
> lexicographical sort anyway. I mean, even in Latin-1, surely 'é' should
> not come after 'z'?

Not always. In particular there's time when a dependable sort order is
required, but just what that sort order is isn't important. In those cases it
can useful that UTF-8 and UTF-32 will both do a binary sort with equivalent
results.

>
> Of course, UTF-16 doesn't have the binary sort property either.

Nope, though an efficient mechanism to sort UTF-16 in the codepoint order is
available.



This archive was generated by hypermail 2.1.5 : Thu Jan 18 2007 - 15:54:24 CST