length of text by different languages

From: Yung-Fong Tang (ftang@netscape.com)
Date: Wed Mar 05 2003 - 18:55:18 EST

Next message: jameskass@att.net: "RE: Ya-phalaa"

Previous message: Andy White: "RE: Ya-phalaa"
Next in thread: Francois Yergeau: "RE: length of text by different languages"
Maybe reply: Francois Yergeau: "RE: length of text by different languages"
Reply: Doug Ewell: "Re: length of text by different languages"
Reply: Jon Babcock: "Re: length of text by different languages"
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ] [ attachment ]
Mail actions: [ respond to this message ] [ mail a new topic ]

I remember there were some study to show although UTF-8 encode each
Japanese/Chinese characters in 3 bytes, Japanese/Chinese usually use
LESS characters in writting to communicate information than alphabetic
base langauges.

Any one can point to me such research? Martin, do you have some paper
about that ?

I would like to find out the average ration between
English,
Geram,
French,
Japanese,
Chinese,
Korean

in term of the number of characters, and in term of the bytes needed to
encode in UTF-8

If such research information have not been done, maybe one way to figure
the result is to take tranlated Bible fo these language from swords
project, strip out those xml tag and leave the pure text, and measure
the size. Since all the Bible translation communicate the same
information and the volumn is huge enough, that could be a good way to
find out the result. Of course, those mark up need to be taken out to
reduce the noise.

Next message: jameskass@att.net: "RE: Ya-phalaa"
Previous message: Andy White: "RE: Ya-phalaa"
Next in thread: Francois Yergeau: "RE: length of text by different languages"
Maybe reply: Francois Yergeau: "RE: length of text by different languages"
Reply: Doug Ewell: "Re: length of text by different languages"
Reply: Jon Babcock: "Re: length of text by different languages"
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ] [ attachment ]
Mail actions: [ respond to this message ] [ mail a new topic ]

This archive was generated by hypermail 2.1.5 : Wed Mar 05 2003 - 19:31:00 EST