Re: Detecting encoding in Plain text

From: Peter Kirk (peterkirk@qaya.org)
Date: Tue Jan 13 2004 - 07:41:50 EST

  • Next message: Christopher Cullen: "Re: Chinese rod numerals"

    On 13/01/2004 04:10, Marco Cimarosti wrote:

    > ...
    >
    >In this case (as in most other similar cases), you should rather blame the
    >people who send you e-mail without encoding declaration.
    >
    >
    >
    I get plenty of them. But then I assume that they default to ASCII or
    Windows-1252. Is there in fact a formal default for e-mail, HTML etc
    without encoding declaration?

    > ...
    >
    >I don't think that Thai would be such a case. Thai normally uses European
    >digits (the usage scope of Thai digits is probably similar to that of Roman
    >numerals in Western languages), some European punctuation (parentheses,
    >exclamation marks, hyphens, quotes), and spaces (although a Thai space has
    >the strength -- and hence the frequency -- of a Western semicolon).
    >
    >
    >
    In some English texts the combined frequency of digits, parentheses,
    exclamation marks, quotes and semicolons is minimal, so perhaps
    similarly for their Thai counterparts. Does Thai use the basic Latin
    hyphen as part of the spelling of common words? Apart from them there is
    no guarantee that any basic Latin characters will be used.

    >As a minimum, all languages should use line feed and/or new line as line
    >terminators, as Unicode's line and paragraph separators never caught on.
    >
    >
    >
    Yes, but has it caught on in some countries/languages/applications/OSs?
    And will it catch on in future? Anyway, some texts use very long
    paragraphs and so very few explicit line feeds etc.

    -- 
    Peter Kirk
    peter@qaya.org (personal)
    peterkirk@qaya.org (work)
    http://www.qaya.org/
    


    This archive was generated by hypermail 2.1.5 : Tue Jan 13 2004 - 08:23:11 EST