From: Philippe Verdy (verdy_p@wanadoo.fr)
Date: Tue Nov 18 2003 - 05:45:25 EST
From: "Theodore H. Smith" <delete@elfdata.com>
> I've looked into the TST thing.
>
> I'm not sure that it is optimal, despite how popular they are!
>
> Look at this, if I add "1", "2","3", "4", "5", "6", "7", "8", "9" to a
> TST, they will all be in a line, in the tree. All will be reference via
> the "high" node.
>
> So, to find "9", I have to read through 9 items!
Nothing forbids to code each n-ary node as a binary tree for faster
searches.
Mapping the n-ary node to a binary tree is only the first step to compress
the data.
Looking for some open-source RDBMS engine will reveal a lot of interesting
storage structures to map dictionnaries like an index.
The trie is simple to build, but other structures like B-trees will work as
well in a even more optimal search speed. The way you encode each indexed
string in the trie can de indicated in the data part of each node, which can
indicate the length of a common substring from the previous indexed string
in the tree, to reduce its size, and an indicator whever the node in each
non-leaf B-tree page is a "ghost" substring holder or effectively maps to a
final string.
This archive was generated by hypermail 2.1.5 : Tue Nov 18 2003 - 07:02:53 EST