Re: Nicest UTF

From: Philippe Verdy (verdy_p@wanadoo.fr)
Date: Tue Dec 07 2004 - 07:02:19 CST

  • Next message: E. Keown: "Re: proposals I wrote (and also, didn't write)"

    From: "D. Starner" <shalesller@writeme.com>
    > If you're talking about a language that hides the structure of strings
    > and has no problem with variable length data, then it wouldn't matter
    > what the internal processing of the string looks like. You'd need to
    > use iterators and discourage the use of arbitrary indexing, but arbitrary
    > indexing is rarely important.

    I fully concur to this point of view. Almost all (if not all) string
    processing can be performed in terms of sequential enumerators, instead of
    through random indexing (which has also the big disavantage of not allowing
    with rich context dependant processing behaviors, something you can't ignore
    when handling international texts).

    So internal storage of string does not matter for the programming interface
    of parsable string objects. In terms of efficiency and global application
    performance, using compressed encoding schemes is highly recommanded for
    large databases of text, because the negative impact of the decompressing
    overhead is extremely small face to the huge benefits you get when reducing
    the load on system resources, on data locality and on memory caches, on the
    system memory allocator, on the memory fragmentation level, on reduced VM
    swaps and on file or database I/O (which will be the only effective
    limitation for large databases).



    This archive was generated by hypermail 2.1.5 : Tue Dec 07 2004 - 07:10:28 CST