Re: ISO 8859-11 (Thai) cross-mapping table

From: John Cowan (jcowan@reutershealth.com)
Date: Tue Oct 08 2002 - 12:35:08 EDT

Next message: Patrick Andries: "Re: InDesign"

Previous message: Mark Davis: "Re: ISO 8859-11 (Thai) cross-mapping table"
In reply to: Mark Davis: "Re: ISO 8859-11 (Thai) cross-mapping table"
Next in thread: Marco Cimarosti: "RE: ISO 8859-11 (Thai) cross-mapping table"
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ] [ attachment ]
Mail actions: [ respond to this message ] [ mail a new topic ]

Mark Davis scripsit:

> We use this structure in ICU (in UnicodeSet). For a high-level explanation,
> see my site (www.macchiato.com), "Bits of Unicode".

Inversion lists, yes. (Different terminology, same thing).

> As to the binary search, we have used in various contexts before a
> "completely unrolled" binary loop, following John Bentley's formulation.

Ah, that's a different kind. I was talking about recursive unrolling, with
the content of the data structure appearing as constants in the code.
Naturally, this is barbarian-style programming, should really be done with a
program generator, and if not, is suitable only for circumstances where the
values never change. But consider the inversion list for Latin letters:

        0041 005B 0222 0234
        0061 007B 1E00 1E9C
        00C0 00D7 1EA0 1EFA
        00D8 00F7 FF21 FF3B
        00F8 0220 FF41 FF5B

you can unwrap the binary search as:

        if (u < 0x0220) {
            if (u < 0x00C0) {
                if (u < 0x0061) {
                    if (u < 0x005B) return !(u < 0x0041);
                    else return false;
                    }
                else return u < 0x007B;
                }
            else {
                if (u < 0x00F7) {
                    if (u < 0x00D8) return u < 0x00D7;
                    else return false;
                    }
                else return !(u < 0x00F8);
                }
        else { ... # and so on for the other half

If the list is too long, this code also suffers because it overflows the
instruction cache.

-- 
Said Agatha Christie / To E. Philips Oppenheim	John Cowan
"Who is this Hemingway? / Who is this Proust?	jcowan@reutershealth.com
Who is this Vladimir / Whatchamacallum,		http://www.reutershealth.com
This neopostrealist / Rabble?" she groused.	http://www.ccil.org/cowan
	--author unknown to me; any suggestions?

Next message: Patrick Andries: "Re: InDesign"
Previous message: Mark Davis: "Re: ISO 8859-11 (Thai) cross-mapping table"
In reply to: Mark Davis: "Re: ISO 8859-11 (Thai) cross-mapping table"
Next in thread: Marco Cimarosti: "RE: ISO 8859-11 (Thai) cross-mapping table"
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ] [ attachment ]
Mail actions: [ respond to this message ] [ mail a new topic ]

This archive was generated by hypermail 2.1.5 : Tue Oct 08 2002 - 13:32:20 EDT