From: John Cowan (jcowan@reutershealth.com)
Date: Tue Oct 08 2002 - 12:35:08 EDT
Mark Davis scripsit:
> We use this structure in ICU (in UnicodeSet). For a high-level explanation,
> see my site (www.macchiato.com), "Bits of Unicode".
Inversion lists, yes. (Different terminology, same thing).
> As to the binary search, we have used in various contexts before a
> "completely unrolled" binary loop, following John Bentley's formulation.
Ah, that's a different kind. I was talking about recursive unrolling, with
the content of the data structure appearing as constants in the code.
Naturally, this is barbarian-style programming, should really be done with a
program generator, and if not, is suitable only for circumstances where the
values never change. But consider the inversion list for Latin letters:
0041 005B 0222 0234
0061 007B 1E00 1E9C
00C0 00D7 1EA0 1EFA
00D8 00F7 FF21 FF3B
00F8 0220 FF41 FF5B
you can unwrap the binary search as:
if (u < 0x0220) {
if (u < 0x00C0) {
if (u < 0x0061) {
if (u < 0x005B) return !(u < 0x0041);
else return false;
}
else return u < 0x007B;
}
else {
if (u < 0x00F7) {
if (u < 0x00D8) return u < 0x00D7;
else return false;
}
else return !(u < 0x00F8);
}
else { ... # and so on for the other half
If the list is too long, this code also suffers because it overflows the
instruction cache.
-- Said Agatha Christie / To E. Philips Oppenheim John Cowan "Who is this Hemingway? / Who is this Proust? jcowan@reutershealth.com Who is this Vladimir / Whatchamacallum, http://www.reutershealth.com This neopostrealist / Rabble?" she groused. http://www.ccil.org/cowan --author unknown to me; any suggestions?
This archive was generated by hypermail 2.1.5 : Tue Oct 08 2002 - 13:32:20 EDT