Re: length of text by different languages

From: Jon Babcock (jon@kanji.com)
Date: Sat Mar 08 2003 - 09:03:35 EST

  • Next message: Jon Babcock: "Re: length of text by different languages"

    Yung-Fong Tang wrote:
    >
    >
    > Ram Viswanadha wrote:
    >
    >> There is also some information at
    >> http://oss.software.ibm.com/icu/docs/papers/binary_ordered_compression_for_unicode.html#Test_Results
    >>
    >> Not sure if this is what you are looking for.
    >
    > thanks. not really. I am not look into the ratio caused by encoding. But
    > rather the ratio caused by language itself. For example, in order to
    > communicate the idea "I want to eat chicken for dinner tonight", French,
    > German using the same encoding may use different number of characters to
    > communicate the same "IDEA".

    "Efficency" here is dependent on the translation and varies
    widely. (See example below.) That's why the practical experience
    of professional translators will probably provide the best
    answer. I have already mentioned what is, in my experience, the
    range for contemporary Japanese-English and Chinese-English.

    These ratios are important to JE and CE translators because we
    usually get paid by the English word. But it usually takes more
    work to use less words. So, if we don't want to be penalized for
    using concise English, we try to charge by the character count
    in the Chinese or Japanese source text. To quote a rate to our
    clients, we must calculate what the "efficiency ratio" -- to
    coin a term here -- is for our translations in this particular
    field.

    If you want to calculate this ratio yourself, I agree with your
    idea of using Bible translations, although the number of proper
    names may skew the results compared, for example, to technical
    translations. But it woud be a good place to start.

    One example, from thousands, found on yesterday's honyaku ML:

    イメージ合成写真です --> 'simlulated photograph' or 'the
    photograph shown is for illustration only" , i.e., from 21 to 45
    characters in English, the target language. Decide how many
    bytes you're going use to encode the Japanese and the English
    strings here, and you'll get the "efficiency ratio" in this case.

    Jon

    -- 
    Jon Babcock <jon@kanji.com>
    


    This archive was generated by hypermail 2.1.5 : Sat Mar 08 2003 - 09:46:47 EST