Re: UTF-8 can be used for more than it is given credit ( Re: UTF-7 - is it dead? )

From: Philippe Verdy (verdy_p@wanadoo.fr)
Date: Fri Jun 02 2006 - 19:38:17 CDT

  • Next message: Doug Ewell: "Re: UTF-7 - is it dead?"

    From: "Mike Ayers" <mayers@celequest.com>
    > Theodore H. Smith wrote:
    >
    >> Moores law doesn't mean we should be more wasteful.
    >
    > Especially since it's a marketing myth.
    >
    >> If you get a computer 4x as fast, instead of using UTF-32 of UTF-8, you
    >> could maybe make 4x the money by having 4x the throughput.
    >
    > That was my point. However, it's not 4x, since UTF-8 has some overhead
    > involved in encode/decode.

    That's also a myth. The encode/decode overhead is minimal for performance face to the much bigger impact on data locality (internal and external processor caches), memory overhead (page swapping to disk and additional I/O, not counting the even bigger impact with network) caused by larger data such as UTF-32.

    The impact of data compression is always infavor of compression when handling large volume of data, whatever the performance of the processor, memory, disks or networks.

    And despite the processor performance is still constantly improving, the performance needs are still not reached simply because the volume of data to process also constantly explodes (even faster than processor performance). If this was not the case, we would not need to build new super-computers and clusters to handle the exploding volume of data. When the processor performance increases it just helps reducing the energy power needed and the physical space in locations where computers are installed,so it allows building clusters with larger number of processors; but the price per processor is still an issue: it does not decreases so much. So clusters need to be larger, and their price continue to be a limitating factor, when the volume of data explodes.

    For this reason, data structures and internal representations still need to be optimized. Compression of data is still wanted, because users are always demanding more to their computing resources. But today, the most limitating factor however is in networking: this is the place that still costs the most (despite of the price reductions, simply because we depend more on networking for accing to higher volume of data and exchanging services), so networking is more and more used but is not always available at high speed in all areas that depend on third party network providers that don't want to invest there.

    The other factor is the exploding population to serve with the same service, and an increased competition for services, where these people become more demanding: they don't have aor can't always have the fastest network, so services still need to be tuned by using compression technics for data transfers, and smart data exchange protocols, for more interactive results.



    This archive was generated by hypermail 2.1.5 : Fri Jun 02 2006 - 19:45:50 CDT