From: Mark Davis ☕ (mark@macchiato.com)
Date: Mon Feb 14 2011 - 12:51:38 CST
*There are many caveats with any such data gathering, so don't put too much
reliance on the following figures. *
The relative proportions of languages on the web have English declining but
still at well over ⅓, Chinese (S) growing to about ⅐, then come Japanese,
German, Russian, Spanish, Korean, French, Polish, Chinese (T), Arabic,
Portuguese, Italian, Turkish, Dutch, with others less than 1% each.
In the last years there has also been a bit more (relative) growth in
smaller languages. The distribution looks like the following:
1σ - top 6 languages
2σ - next 24 languages
3σ - next 37 languages
Mark
*— Il meglio è l’inimico del bene —*
On Mon, Feb 14, 2011 at 03:17, Marion Gunn <mgunn@egt.ie> wrote:
> The most common letter in English text is "e". Does its high frequency on
> the web just confirm that most web content is still in English?
> mg
>
> Scríobh 14/02/2011 10:57, Charlie Ruland:
>
>> * Mark Davis ☕ [2011-02-14 03:26]:
>>
>> As it turns out, when looking at HTML pages on the web (with a good-sized
>>> sample from work here at Google), SPACE is the most frequent character (by a
>>> huge margin). That is even true on Chinese pages, just because of the
>>> proportion of markup on pages.
>>>
>>> For those interested, the most frequent Alphabetic is 'e'.
>>>
>> I would be interested if I wanted to compress the entire contents of the
>> Web, but I don’t.
>>
>> /— Il meglio è l’inimico del bene —/
>>>
>> /— La ragione è l’inimico del //sognatore//—/
>>
>> Charlie
>> --
>> ERROR·COMMVNIS·FACIT·IVS
>>
>
>
> --
>
> Marion Gunn * eGteo (Estab.1991)
>
> 27 Páirc an Fhéithlinn, Baile an
>
> Bhóthair, An Charraig Dhubh,
>
> Co. Átha Cliath, Éire/Ireland
>
> * mgunn@egt.ie * eamonn@egt.ie *
>
>
>
This archive was generated by hypermail 2.1.5 : Mon Feb 14 2011 - 12:54:20 CST