Re: Devanagari

From: Aman Chawla (creativezeal@hotmail.com)
Date: Sun Jan 20 2002 - 19:39:57 EST


> The fact that UTF-8 economizes on the storage for ASCII characters, is a
> benefit for *all* HTML users, as the HTML syntax is entirely in ASCII and
> claims a significant fraction of the data.

> A UTF-8 encoded HTML file, will therefore have (percentage-wise) less
overhead
> for Devanagari as claimed. Add to that James' observation on graphics
files,
> many of which accompany even the simplest HTML documents and you get a
> percentage difference between the sizes of an English and Devanagari
website
> (i.e. in its entirety) that's well within the fluctuation of the typical
> length in characters, for expressing the same concept in different
languages.

The point was that a UTF-8 encoded HTML file for an English web page
carrying say 10 gifs would have a file size one-third that for a Devanagari
web page with the same no. of gifs - even if you take into account the
fluctuation of the typical length in characters, for expressing the same
concept in different languages. This is because in some cases one language
may express a concept more compactly while in other cases it may not, and on
the whole this effect would balance out and can therefore be neglected.
Therefore transmission of a Devanagari web page over a network would take
thrice as long as that of an English web page using the same images and
presenting the same information.



This archive was generated by hypermail 2.1.2 : Sun Jan 20 2002 - 19:14:59 EST