From: Mark Davis (mark.edward.davis@gmail.com)
Date: Tue Apr 14 2009 - 20:07:25 CDT
What I'm saying is that with the current actual state of affairs in the
world -- not what we wish it would be, but what it is -- then you can't
depend on the content being tagged or tagged correctly for language. Thus if
you are a consuming program for public web pages, and you care about
(meaning that you process differently) languages X, Y, and Z, then you
should be prepared to heuristically detect X, Y, and Z.
- If you don't process language W (differently), you don't need to detect
W, or
- If you are working with a closed known set of pages where the language
is known to be correctly tagged, you could depend on the tag instead of
using detection.
In my experience, for most programs that deeply care about the language of
web pages (like text to speech processing, or Braille devices), heuristic
language detection is a rather small amount of work compared to their main
processing function. If I were buying such a product, and I expected it to
work for my language, I'd certainly be disappointed if it didn't do that,
since it wouldn't work on lots of pages.
Mark
On Tue, Apr 14, 2009 at 17:36, Andrew Cunningham <andrewc@vicnet.net.au>wrote:
> Although WCAG 1,0 and WCAG 2,0 require language tagging.
>
> On Wed, April 15, 2009 5:39 am, Mark Davis wrote:
> > It is a chicken & egg problem. Web page creators will only bother to set
> > the
> > language (or set it different than the default) if the language setting
> > makes a difference. Because so much content is badly tagged, all of the
> > interpreters of the pages end up having to disregard that information,
> and
> > compute the language heuristically ("language detection"). Because of
> that
> > the language setting doesn't make a difference, so the creators don't
> > bother
> > setting it.
>
>
> Although the question becomes how many languages can you identify
> heuristically?
>
>
>
> --
> Andrew Cunningham
> Research and Development Coordinator
> Vicnet
> State Library of Victoria
> Australia
>
> andrewc@vicnet.net.au
>
>
This archive was generated by hypermail 2.1.5 : Tue Apr 14 2009 - 20:10:31 CDT