VOA- utf-8, lang="en" (Re: BBC.co.uk languages ...)

From: Donald Z. Osborn (dzo@bisharat.net)
Date: Tue Apr 14 2009 - 09:23:42 CDT

  • Next message: Christopher Fynn: "Re: proposal for a "Standard-Exit" or "Namespace" character"

    Thanks to all for the feedback on this topic. It sounds like the
    choice of utf-8 or not is mainly one of policy (or lack of same) and
    not technical restraints?

    Interesting on this point to contrast with VOA,* which has all of its
    language pages in utf-8.

    On the other hand, while BBC uses lang= parameter in page coding to
    indicate the main language in each page, VOA pages are apparently all
    lang="en"

    Like BBC, VOA ASCIIfies Hausa Boko orthography. It also has no text in
    Amharic or Tigrinya (among non-Latin scripts), only audio from an
    English language "Horn" page.

    Like BBC, it groups the similar languages Kinyarwanda and Kirundi on a
    single page (with text in one, the other, both, or something
    inbetween). It would be interesting to know what exactly is the
    language of the text content of that page. BBC codes their page "rw"
    (for Kinyarwanda), not "rn" (for Kirundi), even though both languages
    share it. But as already noted, VOA incorrectly uses lang="en"
    everywhere.

    * http://www.voa.gov (click on Languages) or
    http://www.voanews.com/english/screen_map.cfm



    This archive was generated by hypermail 2.1.5 : Tue Apr 14 2009 - 10:13:30 CDT