From: Behnam (behnam.rassi@gmail.com)
Date: Thu May 29 2008 - 13:07:02 CDT
On 29-May-08, at 11:32 AM, Phillips, Addison wrote:
> Language identification can be applied at many levels to a
> document. It can certainly be applied to a string of characters. It
> can also be usefully applied to sentences, paragraphs, chapters,
> sections, entire documents, and even collections of documents. (And
> a document need not be written--sound recordings, for example,
> often use language).
>
> There are at least two types of language identification (see [1]).
> For the kind you mean here, language identification can work at any
> appropriate level of granularity. This email, for example, is
> entirely in English. This is no point to marking up every single
> sentence, line, word, or character with a language tag when the
> Content-Language header for the whole thing does the job nicely.
> Certainly a span of text can be in another language and should be
> appropriately tagged. But over-tagging increases complexity and
> burns bandwidth/storage to no good effect. Or, as we say in
> language tagging land, "Tag Content Wisely".
>
> Best Regards,
>
> Addison
>
> [1] http://www.w3.org/TR/i18n-html-tech-lang/
Yes Addison thanks. This is what initially Ken Whistler pointed out.
Exporting the document to html seems to be the best available option.
I'll work on it.
Behnam
This archive was generated by hypermail 2.1.5 : Thu May 29 2008 - 13:08:57 CDT