From: John Cowan (jcowan@reutershealth.com)
Date: Tue Oct 08 2002 - 08:44:59 EDT
Elliotte Rusty Harold scripsit:
> The Verifier class has a similar issue, though there it's a case of
> determining whether or not any given character is a legal XML
> character/name character/name-start character/ etc. This is done with
> a trick introduced in JDOM where the code looks like this:
The underlying data structure here is called a "range table", and is
a list of ranges in codepoint order, expressed thus:
start of first range
end of first range + 1
start of second range
end of second range + 1
etc. etc. What you are doing is equivalent to a linear search over
this range followed by loop unrolling. However, you can do better,
especially in complex cases, with a *binary* search over the range
followed by loop unrolling. The trick here is that if the binary
search returns an even value, it succeeds; an odd value fails.
But I agree that testing ASCII first is wise. Mozilla bypasses
its general algorithm in the ASCII/Latin-1 case, providing a fast
256-element lookup table and then an extremely compact (but somewhat
slower than the above) data structure for the rest of Unicode.
-- John Cowan jcowan@reutershealth.com At times of peril or dubitation, http://www.ccil.org/~cowan Perform swift circular ambulation, http://www.reutershealth.com With loud and high-pitched ululation.
This archive was generated by hypermail 2.1.5 : Tue Oct 08 2002 - 09:35:43 EDT