From: Doug Ewell (doug@ewellic.org)
Date: Mon Dec 28 2009 - 22:19:57 CST
"verdy_p" <verdy underscore p at wanadoo dot fr> wrote:
> The [BOCU-1] reset byte can be used for something more useful: it can
> be used as a key separator when sorting for example lists of
> multicolumn output with priority between columns, even if each column
> is sorted in binary codepoint order. The separator is actully not a
> character, but represents a metacharacter that will be higher than
> everything else, so it can effectively terminate all binary encoded
> strings (when they are differentà, and maintain their relative
> ordering; the following sort keys (further data columns) appended
> after it will not break the sort order of distinct level-1 keys, but
> you'll be able to binary sort on the second column when two rows have
> binary identical first columns...
Unicode, and even ASCII, contains plenty of seldom-used control
characters, with defined semantics if that is desirable, which an
internal process can safely insert, use, and remove for purposes like
this. There's no need to overload an internal characteristic of an
encoding to accomplish this, especially since it ties your data to a
particular encoding.
Someone tried to do something like this with UTF-8 a lot of years ago,
and to make a long story short, that's how we got the tag characters.
-- Doug Ewell | Thornton, Colorado, USA | http://www.ewellic.org RFC 5645, 4645, UTN #14 | ietf-languages @ http://is.gd/2kf0s
This archive was generated by hypermail 2.1.5 : Mon Dec 28 2009 - 22:21:53 CST