Re: ISO 8859-11 (Thai) cross-mapping table

From: John Cowan (jcowan@reutershealth.com)
Date: Tue Oct 08 2002 - 08:44:59 EDT

  • Next message: Elliotte Rusty Harold: "Re: ISO 8859-11 (Thai) cross-mapping table"

    Elliotte Rusty Harold scripsit:

    > The Verifier class has a similar issue, though there it's a case of
    > determining whether or not any given character is a legal XML
    > character/name character/name-start character/ etc. This is done with
    > a trick introduced in JDOM where the code looks like this:

    The underlying data structure here is called a "range table", and is
    a list of ranges in codepoint order, expressed thus:

            start of first range
            end of first range + 1
            start of second range
            end of second range + 1

    etc. etc. What you are doing is equivalent to a linear search over
    this range followed by loop unrolling. However, you can do better,
    especially in complex cases, with a *binary* search over the range
    followed by loop unrolling. The trick here is that if the binary
    search returns an even value, it succeeds; an odd value fails.

    But I agree that testing ASCII first is wise. Mozilla bypasses
    its general algorithm in the ASCII/Latin-1 case, providing a fast
    256-element lookup table and then an extremely compact (but somewhat
    slower than the above) data structure for the rest of Unicode.

    -- 
    John Cowan                                jcowan@reutershealth.com
    At times of peril or dubitation,          http://www.ccil.org/~cowan
    Perform swift circular ambulation,        http://www.reutershealth.com
    With loud and high-pitched ululation.
    


    This archive was generated by hypermail 2.1.5 : Tue Oct 08 2002 - 09:35:43 EDT