Re: Detecting encoding in Plain text

From: Peter Kirk (peterkirk@qaya.org)
Date: Tue Jan 13 2004 - 07:41:50 EST

Next message: Christopher Cullen: "Re: Chinese rod numerals"

Previous message: Marco Cimarosti: "RE: Detecting encoding in Plain text"
In reply to: Marco Cimarosti: "RE: Detecting encoding in Plain text"
Next in thread: D. Starner: "Re: Detecting encoding in Plain text"
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ] [ attachment ]
Mail actions: [ respond to this message ] [ mail a new topic ]

On 13/01/2004 04:10, Marco Cimarosti wrote:

> ...
>
>In this case (as in most other similar cases), you should rather blame the
>people who send you e-mail without encoding declaration.
>
>
>
I get plenty of them. But then I assume that they default to ASCII or
Windows-1252. Is there in fact a formal default for e-mail, HTML etc
without encoding declaration?

> ...
>
>I don't think that Thai would be such a case. Thai normally uses European
>digits (the usage scope of Thai digits is probably similar to that of Roman
>numerals in Western languages), some European punctuation (parentheses,
>exclamation marks, hyphens, quotes), and spaces (although a Thai space has
>the strength -- and hence the frequency -- of a Western semicolon).
>
>
>
In some English texts the combined frequency of digits, parentheses,
exclamation marks, quotes and semicolons is minimal, so perhaps
similarly for their Thai counterparts. Does Thai use the basic Latin
hyphen as part of the spelling of common words? Apart from them there is
no guarantee that any basic Latin characters will be used.

>As a minimum, all languages should use line feed and/or new line as line
>terminators, as Unicode's line and paragraph separators never caught on.
>
>
>
Yes, but has it caught on in some countries/languages/applications/OSs?
And will it catch on in future? Anyway, some texts use very long
paragraphs and so very few explicit line feeds etc.

-- 
Peter Kirk
peter@qaya.org (personal)
peterkirk@qaya.org (work)
http://www.qaya.org/

Next message: Christopher Cullen: "Re: Chinese rod numerals"
Previous message: Marco Cimarosti: "RE: Detecting encoding in Plain text"
In reply to: Marco Cimarosti: "RE: Detecting encoding in Plain text"
Next in thread: D. Starner: "Re: Detecting encoding in Plain text"
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ] [ attachment ]
Mail actions: [ respond to this message ] [ mail a new topic ]

This archive was generated by hypermail 2.1.5 : Tue Jan 13 2004 - 08:23:11 EST