From: Mark Davis (mark.davis@jtcsv.com)
Date: Tue May 06 2003 - 12:13:14 EDT
A minor correction: a trie may be more than two stages.
We also have an enhanced version (developed by Markus Scherer) called
a folded trie, that we use in ICU. See
http://oss.software.ibm.com/icu/docs/papers/foldedtrie_iuc21.ppt.
These data structures (and others that we use in ICU) are flattened;
that is, they are stored as a continuous block of memory, with offsets
instead of pointers. That way, they can all be used in read-only
memory-mapped files for fast initialization.
Märk Davis
________
mark.davis@jtcsv.com
IBM, MS 50-2/B11, 5600 Cottle Rd, SJ CA 95193
(408) 256-3148
fax: (408) 256-0799
----- Original Message -----
From: "Addison Phillips [wM]" <aphillips@webmethods.com>
To: "Theodore H. Smith" <delete@elfdata.com>
Cc: <unicode@unicode.org>
Sent: Tuesday, May 06, 2003 08:00
Subject: Re: Finite state machines? UTF8: toFold(), normalisation, etc
> Hi,
>
> A "trie" is a two stage table.
>
> PPT is PowerPoint. I was under the impression that the presentation
was
> available in PDF or HTML format also, but I guess it isn't if you
didn't
> see it in that format.
>
> Here's a hint, though: if you search "Bits of Unicode" on Google,
you
> can view much of the paper in HTML format using the option Google
> provides there.
>
> Best Regards,
>
> Addison
>
> Theodore H. Smith wrote:
> > Hi Addison,
> >
> > Thanks a lot for the answers that may help me get a clean
solution.
> >
> > I'm unfamiliar with "trie". What does it mean? If it's less
complex than
> > a finite state machine I'm sure that'll be a benefit for me.
> >
> > "Bits of Unicode" is in .ppt format. Is that "Power point"? I
don't have
> > powerpoint or an app to read .ppt.
> >
> > Thanks a lot for your kind help.
> >
> >> Hi Mr. Smith,
> >>
> >> I wrote about "compiling" the Unicode character data tables in
my
> >> response. That reply was somewhat sketchy: my three-year old son
was
> >> sitting in my lap waiting for his machine to boot while I wrote
it...
> >>
> >> Mark Davis wrote more-or-less the canonical presentation on this
> >> subject for an IUC conference a few years ago. The title was
"Bits of
> >> Unicode". It may be elsewhere, but I've always found it on his
> >> personal page http://www.macchiato.com
> >>
> >> I have personally had reason to compile my own tables (NOT using
a
> >> finite state language, just tries and similar structures) for
purposes
> >> beyond those of ICU. But I must admit that in recent years I
have
> >> tended to extend ICU or the very similar code in the Java JDK
instead
> >> of implementing my own tables, but it isn't that hard to do.
Getting
> >> the edge cases and esoteric details right, though, make it not
worth
> >> my while (in my estimation).
> >>
> >> A finite state machine could certainly do "the job" (although
what you
> >> really have is a number of similar "jobs" to do), but trie
tables and
> >> similar structures are a lot easier to build and maintain and do
the
> >> job marvelously well.
> >>
> >> Good luck with your implementation.
> >
> >
> >
> > --
> > Theodore H. Smith - Macintosh Consultant / Contractor.
> > My website: <www.elfdata.com/>
> >
>
>
> --
> Addison P. Phillips
> Director, Globalization Architecture
> webMethods, Inc.
>
> +1 408.962.5487 mailto:aphillips@webmethods.com
> -------------------------------------------
> Internationalization is an architecture. It is not a feature.
>
> Chair, W3C I18N WG Web Services Task Force
> http://www.w3.org/International/ws
>
>
>
>
>
This archive was generated by hypermail 2.1.5 : Tue May 06 2003 - 13:14:59 EDT