From: Philippe Verdy (verdy_p@wanadoo.fr)
Date: Tue Dec 07 2004 - 07:02:19 CST
From: "D. Starner" <shalesller@writeme.com>
> If you're talking about a language that hides the structure of strings
> and has no problem with variable length data, then it wouldn't matter
> what the internal processing of the string looks like. You'd need to
> use iterators and discourage the use of arbitrary indexing, but arbitrary
> indexing is rarely important.
I fully concur to this point of view. Almost all (if not all) string
processing can be performed in terms of sequential enumerators, instead of
through random indexing (which has also the big disavantage of not allowing
with rich context dependant processing behaviors, something you can't ignore
when handling international texts).
So internal storage of string does not matter for the programming interface
of parsable string objects. In terms of efficiency and global application
performance, using compressed encoding schemes is highly recommanded for
large databases of text, because the negative impact of the decompressing
overhead is extremely small face to the huge benefits you get when reducing
the load on system resources, on data locality and on memory caches, on the
system memory allocator, on the memory fragmentation level, on reduced VM
swaps and on file or database I/O (which will be the only effective
limitation for large databases).
This archive was generated by hypermail 2.1.5 : Tue Dec 07 2004 - 07:10:28 CST